Lag Length and Mean Break in Stationary VAR

Lag Length and Mean Break in Stationary VAR
Minxian Yang1 School of Economics The University of New South Wales Sydney 2052 Australia
abstract Break tests are often performed in practice as a specication check on a model selected by an information criterion. In the context of a vector autoregression (VAR) model, when the lag length is selected ignoring a possible break in the mean, it is not clear whether the subsequent break tests have desired power properties since the estimated lag length may be inconsistent. We show that the estimated lag length is at worst biased upwards asymptotically if a break in the mean is ignored. Thus the large-sample power properties of the break tests are not aected by conditioning on the estimated lag length. The above modelling approach is compared with two other approaches: testing the break prior to selecting the lag length and determining the break and lag length simultaneously. Monte Carlo experiments show that the third approach tends to select a spurious break too often and the rst two approaches exhibit stable performances. JEL classication: C12,C13,C32 Keywords: Autoregression lag length; Mean break; Information criteria; Consistency
1
The rst draft of this article is written during my visit to the Department of Economics at UCSD, whose hospitality
is greatly appreciated. I wish to thank Graham Elliott and Clive Granger for their comments. Financial support from the Australian Research Council is gratefully acknowledged.
Telephone: +61-(2)-9385-3353 Facsimile: +61-(2)-9313-6337 Email: m.yang@unsw.edu.au
First draft: March 2001 This draft: April 2001
Introduction
Two of the important issues in constructing a vector autoregression (VAR) model are: determining the models lag length and checking the models parameter stability. When there is no structural break, the lag length of the VAR is usually estimated by minimizing an information criterion. Commonly used criteria include AC of Akaike (1969), SC of Schwarz (1978) and HQ of Hannan and Quinn (1978). On the other hand, when the lag length is known, the parameter stability may be tested by employing various testing procedures, such as those of Andrews (1993) and Andrews and Ploberger (1994). The issue addressed in this paper is how one determines the models specication in presence of a possible break in the mean, more specically, whether one should estimate the lag length rst, or check parameter stability rst, or determine both simultaneously. In the subsequent discussion, we assume that data are generated from a true VAR that is referred to as the correct model. This paper focuses on the mean break mainly because the break in the mean has severe impact on forecast performance on the one hand and its simplicity helps to highlight its interaction with the lag length selection on the other. We shall consider the following three approaches that can be used to determine the lag length and possible break. The rst approach is commonly adopted in practice where break tests are conducted as a specication check on an estimated model. For the VAR model, the approach amounts to rst estimating the lag length ignoring the possible break and then performing the break tests based on the estimated lag length. This approach is suitable for the cases where the investigator holds a prior belief that no break is present. In these cases, as the possible break is ignored at the time of selecting the lag length, it is desirable to know how the inaccuracy of the lag length estimator aect the subsequent break tests, the chance of getting the correct model, and the 1
forecast performance of the resultant model. While this approach is commonly used, to the authors knowledge, its asymptotic behavior has not been considered in the literature. In this paper, we justify the asymptotic validity of the subsequent break tests by showing that the lag length estimated by SC or HQ is consistent when the ignored mean break is small and is asymptotically biased upwards when the ignored mean break is large (Assumption 1(c) and Theorem). Hence the large-sample power properties of the break tests will not be aected as the tests are conducted conditional on a lag length that tends to be larger than the correct length at the worst. The second approach is to perform the break tests rst on the VAR with a maximal lag length that is known to the investigator and then estimate the lag length based on the result of the break tests (a break dummy may be included in the model). For this approach, the break tests are carried out on a correct (but not parsimonious) model and will have correct power properties for large samples. Consequently, the lag length estimated with SC or HQ will be consistent when the mean break is present. The probability that the approach selects the correct model depends on the nominal size of the break tests and the presence (absence) of the mean break. The third approach is to determine both the break and lag length simultaneously using an information criterion. This approach diers from the previous two in that hypothesis testing is not involved. Given the result of Bai (1997), this approach can also be justied as being consistent because, when the break is consistently estimated, SC or HQ will consistently select the correct lag length. The amount of computation involved in this approach is much larger than what is required by the previous two. In this framework, the rst approach may be regarded as a strategy of going from specic to general and the opposite may be labelled to the second and third
approaches. Given the large-sample properties of the three approaches, it is interesting and important to investigate their nite sample performances. Monte Carlo experiments are used to assess the nite sample properties of the three approaches in a twodimensional VAR model. The performance of each approach is assessed by the probability of identifying the correct model and the mean square error (MSE) of one-step forecast. While the forecast performance is the ultimate criterion in comparing models, the probability of getting the correct model also appears important for constructing structural VARs [e.g. Blanchard and Quah (1989)] where the models dynamics is crucial. The simulation results can be summarized as follows. Firstly, the third approach tends to detect a spurious break too often when there is no break in the data generating process. Secondly, SC appears sensitive to the magnitude of lag coecients and severely underestimate the lag length in small and medium samples when the coecient magnitude is small. Finally, the rst and second approaches in combination with AC and HQ have comparable performances that appear stable across all experiments. The probability of the rst approach identifying the correct model is slightly higher than that of the second approach whereas the forecast MSE of the rst approach is slightly larger than that of the second approach. Recent related work includes Kiviet and Dufour (1997) and Rossi (2000). The former considered exact testing procedures that can be used to determine the lag length and structural change in the framework of single-equation autoregressive distributed lag models. The latter proposed jointly testing procedures for model selection with structural break in a GMM setting. The paper is organized as follows. Section 2 lays out the framework and asymptotic considerations. The simulation design and results are given in Section 3. Section
4 concludes the paper. The proof is contained in Section 5.
2
2.1
Model and Large Sample Consideration

Model
Suppose that we observe the p-dimensional time series {yt }n t=K that can be decomposed into two unobservable parts
o yt = yt + n 1t ( ),
(1)
where n and K are respectively the eective sample size and the number of reserved
o observations. The rst part yt is generated by the vector autoregressive (VAR) pro-
cess
o (L)yt = c + t ,
(2)
where (L) = Ip
k j=1
j Lj with L being the lag operator and k standing for the
lag length; the random error term t is iid N(0, ) with a positive denite variance . The second part n 1t ( ) represents a possible break in the mean of the series, where the indicator 1t ( ) assumes value one if t and zero otherwise. Additionally, we make the following assumptions. Assumption 1 (a) The roots of |(z)| = 0 are outside the unit circle. (b) The true break fraction r0 = 0 /n = [r1 , r2 ] is xed where 0 < r1 < r2 < 1. (c) n = /n with being a constant vector and 0. (d) The true lag length is k0 . (e) A maximal lag length km k0 is known.
o Part (a) of the assumption requires yt being stationary so that yt itself is piece-
wise stationary. Part (b) is a standard assumption used to derive asymptotic theory, 4
where the interval is usually chosen to be [.15, .85]. Part (c) states that the change in the constant term may shrink towards zero as the sample size increases. It is used to dene small breaks that are dicult to detect in both small and large samples. Part(e) assumes the knowledge of a maximal lag length, which is again a standard assumption in the literature. We can write the DGP (1) and (2) as
k0
yt = c +
j=1
j ytj + bt (0 ) + t
(3)
where
However, the small episode of dynamics in bt (0 ) will be replaced by a single dummy variable in the following model
k
0, n , bt (0 ) = (I 1 )n , p . . . (1) , n
t < 0 t = 0 t = 0 + 1 . . . t 0 + k0 .
yt = c +
j=1
j ytj + d1t ( ) + errort ,
(4)
which is used for statistical inferences. The hypotheses regarding the mean break in (4) are stated as H0 : d = 0 and H1 : d = 0.
2.2
Model Selection
When there is no break (n = 0) or the break point (0 ) is known, the lag length k0 may be estimated by minimizing certain information criteria. We shall consider the criteria of Akaike (1969), Schwarz (1978) and Hannan and Quinn (1978), denoted as AC, SC and HQ respectively. The estimators of k0 are minimizers of AC(k) = log |k | + 5 2 p(pk + i), n
log n SC(k) = log |k | + p(pk + i), n h log log n HQ(k) = log |k | + p(pk + i), n respectively, where log is the natural logarithm; k is the maximum likelihood estimator of with lag length k; h > 2 is a constant (h = 2.02 is used in the simulation in Section 3); i = 2 if a break-dummy variable is included and i = 1 otherwise. It is well known that, when there is no break or the break point is known, both the SC and HQ estimators are strongly consistent. Derived from the principle of maximizing the posterior probability, SC is known to have the tendency of under-estimating the lag length in small samples. For selecting the lag length of AR models, HQ (based on the law of the iterated logarithm) gives the lowest possible penalty term that still ensures the estimators consistency. Obviously, the fact that AC is not consistent does not rule out the possibility that the AC estimator might outperform SC or HQ in small samples. On the other hand, if the true lag length k0 is known, a break test may be performed to check if there is a break. For instance, the supW test documented in Andrews (1993) can be used, where supW = max W = max W[nr] ,
r
[nr1 ] [nr2 ]
and W is the usual Wald statistic for testing the null of d = 0 against the alternative of d = 0 at a chosen break point in (4). Here [x] stands for the largest integer part of x. Naturally, the family of optimal tests of Andrews and Ploberger (1994) can also be used. For the purpose of this paper, the supW test has the advantage that the break point (or fraction) is estimated at the time of computing supW . We shall be interested in the cases where neither 0 nor k0 is known. Under such circumstances, both the lag length and the break point need to be determined
statistically. We can proceed with one of the following approaches. In any case, the maximal lag length km is required. 1. Determine the lag length rst by one of the criteria, ignoring the possible break. Then perform the break test as a specication check, using the estimated lag length. 2. Use km to perform the break test (if a break is detected, include a dummy variable in the model). Then determine the lag length by one of the criteria. 3. Estimate k0 and 0 at the same time by searching the two dimensional grid {(k, ) : k = 1, . . . , km ; = [nr1 ], . . . , [nr2 ]} to minimize one of the criteria and comparing with the minimized criterion where the break dummy is excluded.
2.3
Large Sample Consideration
For the rst approach, a possibly mis-specied model is used to obtain the lag length estimator, which may be inconsistent. It is critical to clarify how this inconsistency aects a subsequent break test, at least, for large-sample cases. Intuition suggests that the estimated lag length will be biased towards the direction that detecting the break becomes more dicult. However, as long as the lag length estimator is biased upwards (as we will show in the theorem below), the large-sample power properties of the break test will not be aected. For the second approach, the model is correctly specied for the purpose of testing parameter stability, although its lag length is longer than necessary. For large samples, since the presence or absence of a break is correctly detected by the break tests and the lag length subsequently determined by SC or HQ will be consistent. The asymptotic probability that the approach nds the correct model depends on the nominal size of the break test and the presence or absence of the break. For 7
instance, the asymptotic probability of obtaining the correct model is one minus the nominal size of the break test when there is no break, and is one when a break is present and the break test is consistent. The third approach diers from the rst two in that hypothesis testing is not involved and the amount of required computation is signicantly larger. Given the fact that the break fraction r0 and other parameters in the model can be consistently estimated for 0 <
1 2
[Bai (1997)], the (weak) consistency of the lag length
estimators from SC(k) and HQ(k) follows because every possible combination of the lag length and break point is searched. The second and third approaches are consistent in the sense that the probability of correctly identifying the true model tends to one (or one minus the nominal size of the break test) as the sample size goes to innity. However, the asymptotic behavior of the rst approach, which is commonly used in practice, has not been considered in the literature. Therefore, the theorem below may serve as an asymptotic description of the rst approach. Theorem 1 Suppose the break in the mean is ignored when the lag length is estimated. Let k(SC) and k(HQ) be the minimizers of SC(k) and HQ(k) respectively. Under Assumption 1,
p p 1 (1) if 2 , then k(SC) k0 and k(HQ) k0 as n ;
(2) if 0 <
1 2
and q() = 0, then the probability that k(SC) and k(HQ) are
(3) for any 0, the probability that k(SC) and k(HQ) are smaller than k0 tends to 0 as n . When the magnitude of the break n = /n is small ( 1 ), the consistency 2 8
greater than k0 tends to 1 as n , where q() is dened in the proof;
of the SC and HQ estimators are retained. When the break magnitude is large (0 < 1 ), the lag length estimators are asymptotically biased upwards in general. 2 The probability that the break tests are carried out on model with a lag length less than k0 is negligible for large samples. Hence, conditional on the given the lag length, the break tests pertain the desired large sample size and power properties. In this sense, the rst approach is also asymptotically valid. While the lag length estimator is generally inconsistent for large breaks in the mean, this approach does not necessarily yield poor performance in nite samples, since SC and HQ tend to underestimate the true lag length in nite samples.
Simulation Comparison
As analyzed in the previous Section, of the three approaches considered, the rst one may yield an inconsistent lag estimator that is asymptotically biased upwards. However, as SC and HQ tend to underestimate the lag length, the performance of the rst approach need not be inferior to that of the second or third approach in nite samples. Therefore it appears important and interesting to investigate the nite sample performance of the three approaches in controlled simulation experiments.
3.1
Data Generating Process (DGP)
The DGP (1) and (2), i.e.,

o yt = yt + n 1t ( ), o (L)yt = c + t ,
are used for simulation experiments. Specically, the dimension of the VAR p = 2, the true lag length k0 = 2, the maximal lag length km = 5, the true break fraction r0 = 0.7, the constant term c = 0, and the eective sample size n = 100, 300, 500.
The variance of the error term and the break magnitude are chosen as 1 .1 1 4 100 , 2 = n = F , n 0 1
where the scalar = 0, .5, 1, the vector F contains the diagonal elements of (1)1 2 . In other words, the break magnitude is proportional to the square root of the long-run
o variance of yt . For instance, when = .5, the break magnitude is 50% of the square o root of the long-run variance of yt scaled by the factor
4
100 , n
which is introduced to
keep the power of the break test less than one. We consider two DGPs. The rst, DGP1, is dened by .6 0 .2 0 L2 L (L) = I2 .1 .1 .1 .4
The smallest characteristic roots, which are closest to the unit-circle, of both DGPs are the same (= 1.193). However the second lag of yt in DGP1 induces stronger eects on yt than that of DGP2.
and the second, DGP2, is dened by .2 0 .6 0 L2 . L (L) = I2 .1 .2 .1 .3
3.2
Simulation
For each n, (n + 100) observations are drawn from DGP1 (or DGP2) with zero initial values, of which the rst 95 observations are discarded and the last (n + 5) observations constitute a sample for inference. For each sample, three approaches are used to determine the lag length and the break, which are labelled as A1, A2 and A3 respectively and dened as follows.
10
A1: First estimate the lag length with an information criterion, ignoring the possible break. Then perform a test to determine the presence of the break, using the estimated lag length. A2: First perform a test to determine the presence of the break, using the maximal lag length. Then estimate the lag length with an information criterion. A3: Determine the lag length and break simultaneously with an information criterion (by searching both the lag length and the break point). For each approach, three estimates of the lag length are computed from the information criteria AC, SC, and HQ. For the third approach A3, the presence of a break is also determined by the three information criteria. For the rst and second approaches A1 and A2, the supW test1 on the space = [.15, .85] at the 5% level is used to check the presence of the break. The asymptotic critical value (= 11.79), taken from Andrews (1993), is used for the case with = 0 (no break), from which the empirical critical value is computed and subsequently used for the cases with = 0. In the case that a break is detected, a corresponding dummy variable is included in the model. The number of Monte Carlo replications for each specication of the DGPs is 5,000.
3.3
Results and Comparison
Simulation results are summarized in Table 1 and Table 2 (for DGP1 and DGP2 respectively). For each approach and each information criterion, reported in the tables are the following four columns: relative frequencies of correct lag lengths, detected breaks and correct models (i.e. both the lag length and break are correct)
1
The optimal tests, aveW and expW , of Andrews and Ploberger (1994) are included in the
simulation but not reported. In most cases expW and supW have similar size and power properties while aveW is less powerful than expW and supW
11
and the mean squared errors (MSE) of one-step forecasts. The relative frequency of detected breaks (the second column) may be regarded as the size of the break test for the case with = 0 and the size-adjusted power for the cases with = 0. While the results are not clear-cut, some patterns can be observed. Firstly, A3 performs poorly for the cases with n = 100 and = 0. The chance that A3 detects a spurious break in the model is much larger than the chance that the other two approaches do, no matter which information criterion is used. As a result, the chance that A3 identies the correct model is much less than what A2 and A3 can achieve. For instance, when there is no break in DGP1 and n = 100, the relative frequency of detecting a break is 0.673 by A3 but only 0.191 by A1 when HQ is used. The reason is that for small sample sizes the gain from falsely including a break dummy in the model is likely to outweigh the penalty terms in the information criteria. While A1 and A2 (employing the 5% supW test to determine the break) have better performances for the case with n = 100 and = 0, the size distortion of the supW test is still quite large 2 , suggesting that certain small sample adjustment for the test is needed. Moreover, A3 is generally outperformed by the other two approaches in term of forecast MSE for n = 100 and 300. Out of 12 cases in Tables 1 and 2 (rst two panels), there is only one case where A3 with SC achieves the smallest MSE. When n = 500, as the penalty term in SC becomes relatively signicant, A3 combined with SC appears to perform well (but not necessarily better than A1 and A2) for DGP1 in terms of the relative frequency of identifying the correct model. Give SCs sensitivity to the DGPs coecient matrix (see blow), A3 appears to be the worst among the three approaches when n = 100 and 300. On the other hand, A1 and A2 have similar performance and are less sensitive to the absence or presence of the break. On this ground, it seems that A3 should not be recommended for practical
2
Similar size distortions are also observed for aveW and expW .
12
use for small samples. Secondly, the performance of SC appears to be very sensitive to the VARs parameters when n = 100. As the magnitude of the second lag coecient matrix of DGP2 is smaller than that of DGP1, the relative frequency of SC getting the correct lag length in DGP2 drops to less than 12% from as high as 57% in DGP1, regardless of which approach is used. The SC lag estimator is mainly downward-biased and concentrated on the rst lag. For example, in DGP2 with A1 and n = 100, more than 83% of the SC lag estimates are 1 while the true lag length is 2. The bias is most severe with A3 and least severe with A1. This phenomenon may be explained by the fact that the SC penalty term for including one more lag is too heavy (p times larger than that of including a break dummy) to appreciate the gain from doing so, since the signal carried by the second lag is not strong enough in DGP2. Consequently, the chance for SC to select the correct model deteriorates signicantly (e.g., 10.9% in Table 2 compared with 54.9% in Table 1 for the case of A1 with n = 100 and = 1). For the cases with n = 300, the downward bias of the SC lag estimator becomes less severe but is still apparent (e.g., 81.2% in Table 2 compared with 100% in Table 1 for the case of A1 with n = 300 and = 0.5). As sample size increases to 500, the bias is no longer signicant. Surprisingly or not surprisingly, the bias does not seem to impede the forecast performance of SC. In Table 2, the forecast MSEs associated with SC are generally smaller than the MSEs of AC and HQ for each of three approaches. It appears that the combination of A2 and SC generally has the smallest forecast MSEs for n = 100 and 300, where SC is most sensitive to changes in the DGPs coecient matrix. This observation is consistent with the belief that the forecast MSE of a mis-specied parsimonious model can be smaller than that of the correct model (the data generating process). Thirdly, to a lesser degree, HQ suers the same bias problem as SC for n = 100
13
but its bias becomes insignicant when n = 300 and 500. Among the three criteria, AC (in combination with A1 and A2) is least sensitive to the changes in the DGPs coecient matrix and exhibits the greatest relative frequencies of selecting the correct model in DGP2 (but not in DGP1) when n= 100. However, for n = 300 and 500 (and 100 in DGP1), HQ (in combination with A1 and A2) appears to have the highest probability of selecting the correct model in both DGP1 and DGP2, excluding A3, with forecast MSEs very close to the smallest in each case. It appears that HQ has a balanced performance for both small and large samples. Fourthly, the performance of the break test with A1 appears to be comparable with (sometimes better than) that of A2, in terms of empirical size and power. The chance of A1 obtaining the correct model is higher than that of A2 in most cases, regardless of the information criterion used. It seems that A2s strategy of going from general to specic does not necessarily lead to a better chance of getting the correct model. On the other hand, A2 does produce a slightly better forecast performance in many cases. For example, when n = 100 and 300, out of 12 cases in Tables 1 and 2 there are 11 cases where A2 yields the smallest forecast MSEs. An alternative method [Granger and Newbold (1986)] is also used to assess the forecast performance of A1 and A2. Let f1i and f2i be respectively the forecasts of A1 and A2 and yi be the true vector to be forecast in the ith replication of the simulation. Consider the combination of f1i and f2i : fi () = f1i + (1 )f2i with 0. If is chosen to minimize the MSE:
1 R R i=1 [fi ()
yi ] [fi () yi ], where R is the number
of replications, then is a measure of A1s forecast performance relative to A2. For instance, >
1 2
indicates that A1 forecast better than A2 in the simulation. It turns
out that this measure is consistent with the MSE in all experiments. Therefore the results about are not reported. In summary, A3 tends to detect a spurious break too often, SC is sensitive to
14
the lag coecients of DGP, A1 and A2 in combination with AC and HQ exhibit stable performances in small and median samples. Specically, for small and median samples, while A2 seems more suitable for forecast, A1 (with better chance of getting the correct model) appears to be appropriate for the cases where the VAR is used to subtract impulse responses [e.g. Blanchard and Quah (1982)].
Conclusion
The framework considered in this paper is probably the simplest possible. Obvious extensions include cointegrated VARs and the cases where there is a break in the autoregressive coecients, although it has a minor impact on the models forecast performance in comparison with the mean break. It becomes more complicated to consider the asymptotic properties of lag length estimators when a break is ignored in a cointegrated VAR because it involves both trending variables and non-trending variables. The issues considered in this paper remain important to these cases and are under investigation.
Appendix
Proof of Theorem As the break is ignored, the model used for inference can be written as yt = Awt + Bxt + t , where wt = [1, yt1 , . . ., ytk0 ] and xt = [ytk0 1 , . . ., ytk ] if k k0 (or wt = [1, yt1 , . . ., ytk ] and xt = [ytk1 , . . ., ytk0 ] if k < k0 ); A and B collect the coecients of the components in wt and xt respectively; t = bt (0 ) + t . Stacking n observations together, we rewrite the model as y = Aw + Bx +
15
where y = [y1 , . . ., yn ], w = [w1 , . . ., wn ], x = [x1 , . . ., xn ] and = [1 , . . ., n ]. The following limits can be shown to hold
1 n
n
yti ytj
t=1 n
1 n 1 n1
n
t ytj
t=1
bt (0 )ytj
t=1
ji + (1 r0 )( + + ), = 0 p ji () , , >0 ji j + [Vj (1) Vj (r0 )] , = 0 d j () , , >0

j
1 n 1 n1 1 n12
n
ytj
t=1 n
(1 r0 )(1)( + ) , d j () (1 r )(1) ,
0 p
=0
>0
t=1
bt (0 )
g (1 r0 )(1) ,
+ (1 r0 ), = 0 () , , >0
bt (0 )bt (0 )
t=1
h (1 r0 )(1) (1) ,
o o o where = E(yt ), ji = E(yti ytj ) with integers i 0 and j > 0, vec(j ) is a normal
vector with mean zero and variance (0 ), Vj () is a p-vector Brownian motion with variance . The case with k k0 is considered rst, where the true value for B is B = 0. In this case the estimated variance then can be written as 1 1 k = [P P x (xP x )1 xP ] = k0 P x (xP x )1 xP , n n where P = In w (ww )1 w. Using the above limits, it can be veried that + h ga1 1 a g , = 0 1 p 11 1 P () , , n >0 1 =0 ga2 ga1 11 12 , 1 d 1 P x () 0<< ga+ ga+ 11 12 , 1 2 n1 N2 + ga+ (N1 + ga+ )1 12 , = 1 1 11 2 2 1 P x n 1 xP x n
1 + N2 N1 11 12 , p d
1 2
>
1 , 2
22.1 22 21 1 12 , 11
16
+ where a1 = [1, + , . . ., + ]1(pk0 +1) , a2 = [ + , . . ., + ]1p(kk0 ) , a+ and a2 1
are respectively obtained from a1 and a2 by deleting from them; 11 , 12 and 22 are respectively the probability limits of n1 ww , n1 wx and n1 xx , consisting of ji ()s and ()s; N1 = [ , 1 (), . . ., k0 ()] and N2 = [k0 +1 (), . . ., k ()] with being the weak limit of n 2 t . Summarizing, we have Ip ()1 ()1 () n2 + op (n2 ), 0 < 22.1 1 = I 1 ( 1 )1 ( 1 ) n1 + op (n1 ), = 2 22.1 2 2 p 1 Ip 1 + 1 n1 + op (n1 ), > 2 22.1 + a2 a1 1 12 , = 0 11 q() = a+ a+ 1 , 0 < < 2 1 11 12
1
1 2
1 k k0
1 For 0 < 1 , ()1 () = gs()g where s() = q()22.1 q() is a scalar with 22.1 2
1 2
When 0 <
1 2
log |k | log |k0 | = log |Ip n2 ()1 gs()g | + op (n2 ) = log[1 n2 s()g ()1 g] + op (n2 ) holds for large n, implying that as long as s() > 0 or q() = 0, the gain made by increasing the lag length from k0 to k [of order Op (n2 )] outweighs the corresponding penalty [of
1 order O(n1 log n) or O(n1 log log n)]. Therefore, for 0 < 2 , the probability that the
minimizer of SC(k) or HQ(k) is greater than k0 tends to one as n . On the other hand, when
1 2
log |k | log |k0 | = log |Ip n1 1 1 | + op (n1 ) 22.1 = n1 trace[1 1 ] + op (n1 ) 22.1
1 holds for large n, where = ( 2 ) for = 1 2
and = + for > 1 . For large n, the order 2
of the benet from increasing the lag length from k0 to k is Op (n1 ), smaller than that of the penalty term [O(n1 log n) or O(n1 log log n)]. Hence when
1 2
the probability
that the minimizer of SC(k) or HQ(k) is greater than k0 tends to zero as n .
17
Considering the case where k < k0 , we can write 1 1 k0 = [yP y yP x (xP x )1 xP y ] = k yP x (xP x )1 xP y , n n where In w (ww )1 w with wt = [1, yt1 , . . ., ytk ] and xt = [ytk1 , . . ., ytk0 ]. Then the following can be shown to hold for all 0 1 yP y n 1 yP x n 0 0 () 01 1 10 , 11 2 02 01 1 12 , 11
p p
where 01 and 02 are respectively the probability limits of n1 yw and n1 yx , consisting

1 of ji ()s and ()s. Because k k0 = 2 22.1 2 + op (1) is positive denite for large
n, the probability that log |k0 |log |k | becomes a bounded negative number tends to one as n . As the penalty term vanishes for large n, SC(k) or HQ(k) can be further reduced by increasing the lag length from k to k0 . Therefore the probability that the minimizer of SC(k) or HQ(k) is smaller than k0 tends to zero as n . 2
References
Akaike, H. (1969), Fitting autoregressive models for prediction, Ann. Inst. Statist. Math., 21, 243-247 Andrews, Donald W. (1993), Tests for parameter instability and structural change with unknown change point, Econometrica, 61, 821-856 Andrews, Donald W. and W. Ploberger (1994), Optimal tests when a nuisance parameter is present only under the alternative, Econometrica, 62(6), 13831414 Bai, Jushan (1997), Estimation of a change point in multiple regression models, Review of Economics and Statistics, 79(4), 551-563 18
Blanchard, O.J. and D.Quah (1989), The dynamic eects of aggregate demand and supply disturbances, The American Economic Review, 79, p655-673 Granger, C.W.J. and P.Newbold (1986), Forecasting economic time series, 2nd Edition, Academic Press, Orlando, Florida Hannan, E.J. and B.G. Quinn (1978), The determination of the order of an autoregression, J. Roy. Statist. Soc. Ser. B., 41, 190-195 Kiviet, J.F. and J. Dufour (1997), Exact tests in single autoregressive distributed lag models, Journal of Econometrics, 80, 325-353 Rossi, B. (2000), Optimal tests for model selection with underlying parameter instability, manuscript, Princeton University Schwarz, G. (1978), Estimating the dimension of a model, Ann. Statist., 6, 461464
19
correct lags AC SC HQ A1 A2 A3 A1 A2 A3 A1 A2 A3 .838 .814 .776 .837 .813 .773 .836 .778 .751 .705 .672 .613 .732 .689 .588 .701 .566 .502 .883 .859 .813 .896 .867 .802 .884 .802 .755
Table 1: Simulation Results for DGP1 detected breaks correct models AC SC HQ AC SC HQ .179 .209 .890 .233 .246 .985 .780 .779 1.000 .221 .209 .417 .284 .246 .776 .815 .779 .991 n = 100, = 0 .191 .693 .209 .665 .673 .096 n = 100, = .5 .251 .195 .246 .180 .920 .761 n = 100, = 1 .791 .654 .779 .597 .999 .751 n = 300, .088 .099 .400 n = 300, .481 .474 .898 n = 300, .974 .961 1.000 =0 .799 .788 .167 = .5 .406 .406 .850 =1 .808 .828 .861 .571 .551 .427 .175 .138 .415 .549 .408 .495 .723 .695 .297 .217 .194 .729 .694 .606 .755
forecast mse AC SC 6.160 6.187 6.402 13.159 13.185 13.906 36.591 36.598 37.039 6.113 6.101 6.191 13.083 13.014 13.584 36.484 36.366 36.781
HQ 6.148 6.166 6.331
13.137 13.124 13.813 36.563 36.532 36.968
A1 A2 A3 A1 A2 A3 A1 A2 A3
.874 .869 .864 .861 .861 .863 .831 .860 .861
.999 .999 .999 1.000 .999 .999 1.000 .999 .999
.990 .989 .989 .990 .989 .989 .984 .988 .989
.087 .099 .807 .475 .474 .986 .971 .961 1.000
.088 .099 .098 .482 .474 .632 .974 .961 .991
.911 .900 .901 .482 .473 .632 .974 .960 .990
.904 .893 .595 .476 .467 .888 .958 .950 .989
5.778 5.781 5.842 9.925 9.914 10.199 23.188 23.156 23.195
5.774 5.777 5.774 9.927 9.913 10.026 23.181 23.167 23.197
5.774 5.776 5.808 9.927 9.912 10.172 23.180 23.166 23.201
A1 A2 A3 A1 A2 A3 A1 A2 A3
.882 .879 .875 .868 .869 .876 .807 .874 .874
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
.993 .993 .993 .991 .993 .993 .984 .992 .993
.069 .075 .789 .645 .639 .996 .998 .996 1.000
.069 .075 .057 .649 .639 .677 .998 .996 .998
n = 500, = 0 .069 .822 .075 .818 .331 .191 n = 500, = .5 .649 .562 .639 .556 .933 .872 n = 500, = 1 .998 .805 .996 .870 1.000 .874
.931 .925 .943 .649 .639 .677 .998 .996 .998
.924 .919 .664 .643 .635 .926 .982 .989 .993
5.838 5.839 5.848 5.875 5.880 5.805 5.775 5.770 5.770
5.833 5.836 5.835 5.868 5.876 5.861 5.766 5.767 5.766
5.834 5.836 5.843 5.869 5.877 5.815 5.767 5.768 5.767
For each method and each information criterion, the rst three columns list the relative frequencies of correct lag lengths, detected breaks and correct models observed in the simulation. The last column contains one-step forecast mean squared errors.
20
correct lags AC SC HQ A1 A2 A3 A1 A2 A3 A1 A2 A3 .585 .553 .490 .626 .576 .476 .603 .455 .391 .142 .128 .111 .168 .142 .088 .143 .080 .050 .386 .360 .304 .435 .391 .285 .402 .268 .208
Table 2: Simulation Results for DGP2 detected breaks correct models AC SC HQ AC SC HQ .185 .200 .889 .259 .246 .986 .798 .778 1.000 .261 .200 .436 .373 .246 .804 .869 .778 .994 n = 100, = 0 .219 .487 .200 .467 .689 .066 n = 100, = .5 .310 .150 .246 .106 .932 .467 n = 100, = 1 .833 .470 .778 .319 .999 .391 n = 300, = 0 .085 .802 .097 .789 .403 .171 n = 300, = .5 .493 .413 .482 .406 .900 .841 n = 300, = 1 .975 .810 .963 .820 1.000 .851 n = 500, .069 .073 .328 n = 500, .658 .646 .934 n = 500, .998 .997 1.000 =0 .820 .817 .192 = .5 .566 .562 .867 =1 .805 .869 .871 .115 .111 .083 .043 .022 .058 .109 .045 .049 .314 .302 .123 .105 .066 .256 .312 .173 .208
AC 6.135 6.143 6.357 13.121 13.103 13.813 36.535 36.452 36.885
forecast mse SC 6.069 6.018 6.149 12.987 12.766 13.529 36.348 36.062 36.622
HQ 6.090 6.076 6.274 13.048 12.947 13.699 36.479 36.255 36.742
A1 A2 A3 A1 A2 A3 A1 A2 A3
.873 .869 .859 .861 .854 .853 .832 .851 .851
.737 .725 .718 .812 .724 .688 .873 .649 .626
.954 .940 .928 .964 .938 .924 .970 .917 .908
.083 .097 .804 .482 .482 .987 .972 .963 1.000
.103 .097 .121 .523 .482 .661 .978 .963 .993
.676 .664 .661 .398 .306 .398 .852 .617 .620
.865 .852 .568 .473 .440 .827 .945 .881 .908
5.771 5.775 5.840 9.931 9.925 10.202 23.189 23.175 23.209
5.746 5.746 5.747 9.934 9.880 10.007 23.181 23.111 23.143
5.767 5.769 5.799 9.933 9.916 10.176 23.188 23.170 23.210
A1 A2 A3 A1 A2 A3 A1 A2 A3
.879 .878 .873 .865 .868 .871 .807 .872 .871
.974 .973 .972 .987 .971 .967 .994 .964 .960
.992 .991 .991 .990 .990 .990 .984 .990 .989
.069 .073 .785 .653 .646 .996 .998 .997 1.000
.071 .073 .057 .660 .646 .678 .998 .997 .998
.906 .902 .920 .649 .621 .648 .992 .960 .959
.924 .919 .667 .651 .640 .924 .982 .986 .989
5.847 5.849 5.856 5.881 5.887 5.817 5.784 5.781 5.782
5.839 5.842 5.840 5.872 5.880 5.866 5.776 5.774 5.773
5.842 5.844 5.852 5.874 5.882 5.823 5.777 5.778 5.777
For each method and each information criterion, the rst three columns list the relative frequencies of correct lag lengths, detected breaks and correct models observed in the simulation. The last column contains one-step forecast mean squared errors.
21

Lag Length and Mean Break in Stationary VAR

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lag Length and Mean Break in Stationary VAR

Uploaded by

Copyright:

Available Formats

Lag Length and Mean Break in Stationary VAR

Telephone: +61-(2)-9385-3353 Facsimile: +61-(2)-9313-6337 Email: m.yang@unsw.edu.au

First draft: March 2001 This draft: April 2001

4 concludes the paper. The proof is contained in Section 5.

Model and Large Sample Consideration

j Lj with L being the lag operator and k standing for the

j ytj + d1t ( ) + errort ,

Large Sample Consideration

[Bai (1997)], the (weak) consistency of the lag length

greater than k0 tends to 1 as n , where q() is dened in the proof;

Data Generating Process (DGP)

The DGP (1) and (2), i.e.,

and the second, DGP2, is dened by .2 0 .6 0 L2 . L (L) = I2 .1 .2 .1 .3

Results and Comparison

yi ] [fi () yi ], where R is the number

indicates that A1 forecast better than A2 in the simulation. It turns

ji + (1 r0 )( + + ), = 0 p ji () , , >0 ji j + [Vj (1) Vj (r0 )] , = 0 d j () , , >0

+ where a1 = [1, + , . . ., + ]1(pk0 +1) , a2 = [ + , . . ., + ]1p(kk0 ) , a+ and a2 1

and = + for > 1 . For large n, the order 2

that the minimizer of SC(k) or HQ(k) is greater than k0 tends to zero as n .

where 01 and 02 are respectively the probability limits of n1 yw and n1 yx , consisting

HQ 6.148 6.166 6.331

13.137 13.124 13.813 36.563 36.532 36.968

.874 .869 .864 .861 .861 .863 .831 .860 .861

.999 .999 .999 1.000 .999 .999 1.000 .999 .999

.990 .989 .989 .990 .989 .989 .984 .988 .989

.087 .099 .807 .475 .474 .986 .971 .961 1.000

.088 .099 .098 .482 .474 .632 .974 .961 .991

.911 .900 .901 .482 .473 .632 .974 .960 .990

.904 .893 .595 .476 .467 .888 .958 .950 .989

5.778 5.781 5.842 9.925 9.914 10.199 23.188 23.156 23.195

5.774 5.777 5.774 9.927 9.913 10.026 23.181 23.167 23.197

5.774 5.776 5.808 9.927 9.912 10.172 23.180 23.166 23.201

.882 .879 .875 .868 .869 .876 .807 .874 .874

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

.993 .993 .993 .991 .993 .993 .984 .992 .993

.069 .075 .789 .645 .639 .996 .998 .996 1.000

.069 .075 .057 .649 .639 .677 .998 .996 .998

.931 .925 .943 .649 .639 .677 .998 .996 .998

.924 .919 .664 .643 .635 .926 .982 .989 .993

5.838 5.839 5.848 5.875 5.880 5.805 5.775 5.770 5.770

5.833 5.836 5.835 5.868 5.876 5.861 5.766 5.767 5.766

5.834 5.836 5.843 5.869 5.877 5.815 5.767 5.768 5.767

AC 6.135 6.143 6.357 13.121 13.103 13.813 36.535 36.452 36.885

HQ 6.090 6.076 6.274 13.048 12.947 13.699 36.479 36.255 36.742

.873 .869 .859 .861 .854 .853 .832 .851 .851

.737 .725 .718 .812 .724 .688 .873 .649 .626

.954 .940 .928 .964 .938 .924 .970 .917 .908

.083 .097 .804 .482 .482 .987 .972 .963 1.000

.103 .097 .121 .523 .482 .661 .978 .963 .993

.676 .664 .661 .398 .306 .398 .852 .617 .620

.865 .852 .568 .473 .440 .827 .945 .881 .908

5.771 5.775 5.840 9.931 9.925 10.202 23.189 23.175 23.209

5.746 5.746 5.747 9.934 9.880 10.007 23.181 23.111 23.143

5.767 5.769 5.799 9.933 9.916 10.176 23.188 23.170 23.210

.879 .878 .873 .865 .868 .871 .807 .872 .871

.974 .973 .972 .987 .971 .967 .994 .964 .960

.992 .991 .991 .990 .990 .990 .984 .990 .989

.069 .073 .785 .653 .646 .996 .998 .997 1.000

.071 .073 .057 .660 .646 .678 .998 .997 .998

.906 .902 .920 .649 .621 .648 .992 .960 .959

.924 .919 .667 .651 .640 .924 .982 .986 .989

5.847 5.849 5.856 5.881 5.887 5.817 5.784 5.781 5.782