You are on page 1of 6

Time series

Spring 2007

Lab 5 - Bivariate analyses

This lab will cover quite a lot of material. You will use both time domain methods and frequency domain methods to study connections between two data sets. You will use the acf command from previous labs to estimate cross-correlation functions and the spectrum command to estimate the coherence between two data sets. You will also construct approximate condence bands for these estimates. In bivariate analyses you may want to describe a connection between two data sets. Can the inclusion of one data set help describe the other data set and make predictions more reliable? How much of the stochastic behavior of one data set can be explained by the other? In the second part of the lab we will look into how to model this kind of data. In the last part of the lab, we will estimate the transfer function for an AR(1) and an MA(1) process constructed from a known innovation series, and compare the estimates to the true transfer functions. You will learn that a phase plot of the estimated transfer function can tell you if one data set is leading or lagging the other, among other things.

Coherence, phase and cross-correlation


The cross-covariance function is dened as rX,Y (s, t) = E(Xs Yt )E(Xs )E(Yt ) for two time series X and Y . If the series are stationary correlated rX,Y (s, t) only depends on = t s. When this is true rX,Y ( ) = rY,X ( ) but note that rX,Y ( ) is generally not the same as rX,Y ( ). Remember that when you plotted the auto-covariance function you only saw the result for positive because of the symmetry, for the cross-covariance however you will see 4 plots as the output of acf(ts.union(data1,data2)). The diagonal plots are the auto-covariance functions for dataset 1 and 2. The upper right plot is r1,2 ( ) for positive , i.e dataset 1 lagging dataset 2. The lower left plot is r1,2 ( ) for negative . The acf command will plot correlations unless you specify otherwise. These are obtained as X,Y ( ) = rX,Y ( )/(rX,X (0)rY,Y (0)). How is the cross-covariance estimated? rX,Y ( ) = when > 0 and rX,Y ( ) = 1 T
T

1 T

(x(t) x)(y(t + ) y )
t=1

(x(t) x)(y(t + ) y )
t= +1

when < 0. Construct two WN processes where the second is the same as the rst but lagged 3 time units. Estimate the cross-correlation function and state the result. 1

e<-rnorm(300) e1<-ts(e[1:250]) e2<-ts(e[4:253]) acf(ts.union(e1,e2)) You clearly see the lag of e1 on e2 in the plot (e1 takes on the same value as e2 3 time units later). What happens if you use acf(ts.union(e2,e1))? The conclusion is to be careful and consistent with what you consider your dependent or output signal and what your input signal is. From now on lets consider e2 (the leading series) to be the input to some system and e1 what we get out from the system. The dashed lines indicating condence bands in the acf plot are based on WN processes as before. This may grossly underestimate the variance of the cross-correlation estimates! If there is a lot of structure in the individual data sets the white noise condence bands for the cross-correlation estimate can be very misleading. You can try this yourself on two independent data set that we introduce some structure to by low-pass ltering them. Use an MA(q) model for some q as a lowpass lter and create two independent time series. Now estimate their cross correlation function. e1f<-filter(rnorm(250),filter=?) e2f<-filter(rnorm(250),filter=?) acf(ts.union(ts(e1f),ts(e2f))) (Note, you have to remove NA values in e1f, e2f before using acf). As you can see, even though the data sets are independent of each other there are some signicant sample cross-correlation estimates. The reason is that the condence bands are underestimating the variance of the cross-correlation estimate. You could try prewhitening the data before estimating the cross-correlation function (Do and Plot). You could also try bootstrapping. There are several bootstrap programs posted on the class homepage. Download cfdirect.q, statboot.q and FDJ2.q and try them out. (Review the handout for more details: the direct method, the stationary bootstrap and frequency domain jackknife). cfband(e2f,e1f) FDJ2(e2f,e1f) statboot(.05,500,e2f,e1f) Read the header part of each function to understand what the dierent input defaults are. Try changing some defaults (e.g. lagmax, p, B). Comment.

Coherence and Phase


The coherence is a measure of the correlation between two time series, at each frequency. Another way of putting this, how much one data set could be explained by a linear ltered version of the other. The phase spectrum measures the phase shift between the data sets at each (M od(fY,X ()))2 frequency. The squared coherence is dened as follows: |R()|2 = fY,Y ()fX,X () . The phase by: ph() = arctan(Im(fY,X ())/Re(fY,X ())) where fY,X is the cross-spectrum (the Fourier transform of the cross-covariance). How should we estimate the coherence? Assuming mean 0 2

variables we could start with the cross-periodogram


T 1 T 1

IX,Y = (
s=0

eis X(s)
t=0

eit Y (t))/2T

However, if we write JX () =

T 1 is X(s) s=0 e

and similarly for Y one can see that

M od(IX,Y ())2 M od(JX ())M od(JY ()) = =1 M od(JX ())M od(JY ()) (IX,X ()IY,Y ())1/2

I.e. the estimated coherence would always be 1. To get a better estimate we smooth the periodograms and estimate the coherence as above but using the smoothed versions. You can get the function cohplot1 from the class home page. This will plot the coherence and 95 percent condence bands. We went through the approximate asymptotic distribution of coherence and phase estimates in class. To be brief, we apply a variance stabilizing transform (arctanh) to the estimated coherence. The transformed estimate is then (approximately) asymptotically normally distributed, with variance gg/2 where gg = 2/v (v is the smoothing parameter). The phase is estimated similarly using smoothed periodograms. The function phaplot1, also on the class home page, will plot the phase spectrum for you with condence bands. Plot the coherence and the phase spectrum for the time series e1 and e2 and comment. f<-spectrum(ts.union(e1,e2), spans=c(3,3)) cohplot1(f) phaplot1(f) Notice that the coherence is very close to 1 for all frequencies. The phase plot may look a little funny. I have restricted the plot to limits to . Between the jump points you can see that the phase plot has a slope. Calculate what the slope is. plot(f$freq,f$phase) Divide the slope you get by 2 pi. What is the result? State why. Lets use the coherence and phase plot on another data set. Simulate two data sets that are unrelated but have the same frequency component but phase shifted by some amount. Plot the coherence and phase spectrum. Where are the condence bands narrow? What is the phase shift between the series for that frequency? t<-seq(1:250) y1<-ts(8*sin(2*pi*0.15*t)+rnorm(250)) y2<-ts(8*sin(2*pi*0.15*t-1)+rnorm(250)) plot(y1) plot(y2) f<-spectrum(ts.union(y2,y1),spans=c(3,3)) cohplot1(f) phaplot1(f) 3

ARMAX models, regression variables, Multivariate AR


The multivariate AR process is dened as follows: X(t) = P Ai X(t i) + (t). Note that i=0 X(t) = (X1 (t), X2 (t)...., Xr (t)) and Ai is a r by r matrix. O diagonal elements would indicate that values of series Xj does not only depend on lagged values of that series but also on Xi for some i = j. You are going to simulate an ARMAX process, where the X denotes some in-signal that partially drives the output: Y (t) = a1Y (t 1) + a2... + e(t) + b1e(t 1) + ... + c0X(t) + c1X(t 1).... You can then use the acf function or t a multivariate AR process to the data to look for a connection between Y and X. e<-rnorm(501) x<-rnorm(502) xmat<-cbind(x[2:501],x[1:500]) ys<-0 for (k in (1:500)) { if (k==1) { ys[k]<-e[k+1]+.6*xmat[k,1]+.3*xmat[k,2] } if (k>1) { ys[k]<-.8*ys[k-1]+e[k+1]+.6*xmat[k,1]+.3*xmat[k,2] } } ys<-ts(ys[251:500]) xs<-ts(x[253:502]) acf(ts.union(ys,xs)) The model you simulated can be written as Y (t) = 0.8Y (t 1) + e(t) + 0.6X(t 1) + 0.3X(t 2) I have denoted the data sets that you would have available to you after a series of measurements of the series Y and X by ys and xs. As you can see from the acf plot the output ys is correlated with the input xs in such a way the connection seems to be through 1-lag and 2-lag values of xs. Remember that the top right plot of the acf shows ys lagging in xs. Try tting a multivariate AR process using the ar command. arm<-ar(cbind(ys,xs)) Check the estimated coecients and comment. Which coecients are large? Try another set of coecient values in the simulation and comment on the outcome. Now try to model the data using arima.mle. Of course you know what order MA and AR components to use, in practise you would have to gure that out by looking at acf and partial acf plots and checking the AIC, but how would you take the dependence on xs into account? You have guessed from the acf plot that ys depends on the xs values of 1-lag and 2-lag . Estimate the coecients like this: 4

ysa<-ts(ys[3:250]) xsa1<-ts(xs[2:249]) xsa2<-ts(xs[1:248]) mod<-arima(ysa,order=c(1,0,0),xreg=cbind(xsa1,xsa2)) tsdiag(mod) mod$coef As you can see I have re-aligned the series ys with the lag of xs it depends on. Check the values of the estimated coecients. In general if you want to t an ARMAX model to your data you will construct a matrix of lagged values of the input signal. Lets say your study of the series and the acf and possibly a t of a multivariate AR has led you to believe that this model might be appropriate: Y (t) = a1Y (t 1) + e(t) + b(1)e(t 1) + c0X(t) + ... + clX(t l) You would start by setting up the matrix xmat as follows: xmat<-cbind(xs[(l+1):length(xs)],xs[l:(length(xs)-1)],...,x[1:(length(xs)-l)]) Now you can t the model using arima.mle as above mod<-arima(ys[(l+1):length(ys)],order=c(p,0,q),xreg=xmat) Of course, you can also take dierences, do seasonal adjustment, include longer MA and AR components etc.

Estimating the transfer function


Lets assume Y (t) = u a(u)X(tu). The power spectrums of the two series are then related as; fY,Y () = M od(A())2 fX,X (), where A() is the transfer function, i.e. the Fourier transform of the impulse response a. It is often of interest to estimate the transfer function, for e.g. lter analysis. The transfer function is generally a complex entity, and the phase or argument of A indicates whether one data series is lagging or leading another. The gain carries information of which frequencies are transmitted. However, a word of caution. Estimating the transfer function is similar to a linear regression, what if the relation between the series isnt linear? One should be careful to draw conclusion from the estimated transfer function. Always check for non-linearities, and check the coherence. If the coherence is near 1, a linear model is more easily justied. This is also apparent from the variance of the estimated transfer function. The estimate of the transfer function is obtained from the smoothed power spectrums of the series Y and X: fY,Y 1/2 M od(A) = ( ) fX,X The variance of the logarithm of the modulus of the estimate is give by: R2 1 V AR(log(M od(A))) = 2(L 1) 5

, where R2 is the coherence and L is the smoothing parameter of the spectrums. As you can see, if the coherence is near 1 the variance of the estimate is small. The estimated phase of A is given by the phase of the cross-spectrum. The variance of the estimated phase of A is the same as the variance of log(M od(A)). Lets estimate some transfer functions. An MA(1) process can be obtained from an innovation process like this. Y (t) = e(t) + e(t 1) The transfer function, relating the series Y to e is A() = 1 + ei Simulate an innovation process e and the MA(1) process with parameter = 0.8. Estimate the transfer function and compare to the true A as stated here. The function esttransfer(output,input) can be found on the section home page. It estimates the transfer function using the smoothed spectrums and then plots the logarithm of the modulus of A, and the phase, with condence bands. e<-ts(rnorm(251)) xt<-ts(filter(e,filter=c(1,.8))[1:250]) e<-ts(e[2:251]) acf(ts.union(xt,e)) f<-spectrum(ts.union(xt,e),spans=c(3,3)) cohplot1(f) phaplot1(f) lam<-f$freq*2*pi AA<-1+0.8*exp(-1i*lam) mA<-esttransfer(xt,e) plot(f$freq,log(Mod(AA))) plot(f$freq,Arg(AA)) The coherence between the two series is near 1 for almost all frequencies so the condence band of the estimated transfer function is quite narrow. Lets try another process. Simulate an AR(1) 1 process with parameter = 0.8. The transfer function is now A() = 10.8ei . Estimate the transfer function of the AR process as above and compare to the true A. Now, change the sample size, and/or the coecient value of the AR - what happens to the transfer function in these settings? Now we will use the series e1 and e2 from above. e2 was the input series and e1 the output. Estimate the transfer function in this case. What do the gain and phase plots look like? How about the series y1 and y2. What do the plots look like? Comment. What is the gain and phase of the estimated transfer function for frequencies where the coherence is high? Explain, make sure you get the right units (Hint 1: 2, log. Hint 2: Use p<-locator() to get the coordinates of a point in a plot.) Verify these results with the theoretical transfer functions under the simulation models.

You might also like