Professional Documents
Culture Documents
by
Brandon J Whitcher
Doctor of Philosophy
University of Washington
1998
Approved by
(Chairperson of Supervisory Committee)
Program Authorized
to Oer Degree
Date
In presenting this dissertation in partial ful llment of the requirements for the Doc-
toral degree at the University of Washington, I agree that the Library shall make
its copies freely available for inspection. I further agree that extensive copying of
this dissertation is allowable only for scholarly purposes, consistent with \fair use"
as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of
this dissertation may be referred to University Micro lms, 1490 Eisenhower Place,
P.O. Box 975, Ann Arbor, MI 48106, to whom the author has granted \the right to
reproduce and sell (a) copies of the manuscript in micro lm and/or (b) printed copies
of the manuscript made from micro lm."
Signature
Date
University of Washington
Abstract
Chairperson of Supervisory
Committee: Professor Peter Guttorp & Professor Donald B. Percival
Statistics & Applied Physics Laboratory
The discrete wavelet transform has be used extensively in the eld of Statistics, mostly
in the area of \denoising signals" or nonparametric regression. This thesis provides
a new application for the discrete wavelet transform, assessing nonstationary events
in time series { especially long memory processes. Long memory processes are those
which exhibit substantial correlations between events separated by a long period of
time.
Departures from stationarity in these heavily autocorrelated time series, such as
an abrupt change in the variance at an unknown location or \bursts" of increased
variability, can be detected and accurately located using discrete wavelet transforms
{ both orthogonal and overcomplete. A cumulative sum of squares method, utilizing
a Kolomogorov{Smirnov-type test statistic is applied to this problem. By analyz-
ing a time series on a scale by scale basis, each scale corresponding to a range of
frequencies, the ability to detect and locate a sudden change in the variance in the
time series is introduced. Using this same procedure to detect a change in the long
memory parameter, when the process variance remains constant, is also briey in-
vestigated. Applications involve Nile River minimum water levels and vertical ocean
shear measurements.
In the atmospheric sciences, broadband features in the spectrum of recorded time
series have been hypothesized to be nonstationary events e.g., the Madden{Julian
oscillation. The Madden{Julian oscillation is a result of large-scale circulation cells
oriented in the equatorial plane from the Indian Ocean to the central Paci c. The
oscillation has been noted to have higher frequencies during warm events in El Ni~no{
Southern Oscillation (ENSO) years. The concepts of wavelet covariance and wavelet
correlation are introduced and applied to this problem as an alternative to cross-
spectrum analysis. The wavelet covariance is shown to decompose the covariance
between two stationary processes on a scale by scale basis. Asymptotic normality of
estimators of the wavelet covariance and correlation is shown in order to construct
approximate con dence intervals. Both quantities are generalized into the wavelet
cross-covariance and cross-correlation in order to investigate possible lead/lag rela-
tions in bivariate time series on a scale by scale basis.
Atmospheric measurements (such as station pressure and zonal wind speeds) from
a single station at Canton Island (2.8 S, 171.7 W) are put through a wavelet analysis
of covariance and are shown to provide similar results to those found in Madden and
Julian (1971) and multitaper spectral techniques. To investigate the possible inter-
action between ENSO activity and the Madden{Julian oscillation, a daily \South-
ern Oscillation Index" and station pressure series collected from Truk Island (7.4 N,
151.8 W) are analyzed. The wavelet cross-covariance nicely decomposes the usual
cross-covariance into scales which are more easily associated with atmospheric phe-
nomena. The time-varying wavelet variance and covariance are used to investigate
possible seasonal eects and changes due to ENSO activity.
TABLE OF CONTENTS
List of Figures vi
List of Tables x
Chapter 1: Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Detecting Nonstationary Events in Long Range Dependence . 2
1.1.2 Wavelet Analysis of Bivariate Time Series . . . . . . . . . . . 3
1.2 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
v
LIST OF FIGURES
ix
LIST OF TABLES
3.1 Scaling coecients for the Daubechies least asymmetric wavelet lter
of length L = 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Equivalent degrees of freedom for the MODWT of white noise . . . . 39
3.3 Large sample approximation to the ratio of equivalent degrees of free-
dom j =N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1 Maximum dynamic range for the spectra of DWT wavelet coecients
when applied to fractional dierence processes . . . . . . . . . . . . . 53
4.2 Maximum dynamic range for the spectra of DWT wavelet coecients
when applied to AR(1) and MA(1) processes . . . . . . . . . . . . . . 55
4.3 Monte Carlo critical values for the test statistic (N=2) 12 D . . . . . . . 61
4.4 Performance of the cumulative sum of squares method for fractional
dierence processes with one variance change . . . . . . . . . . . . . . 72
4.5 Empirical power of iterated CSS algorithm for fractional dierence
processes with one variance change . . . . . . . . . . . . . . . . . . . 79
4.6 Empirical power of the iterated CSS algorithm for fractional dierence
proccesses with two variance changes . . . . . . . . . . . . . . . . . . 80
4.7 Rejection rates for a change in the long memory parameter of a frac-
tional dierence process . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1 Variance of ^XY (j ) j = 1 : : : 6, for two white noise time series
associated via linear regression with delay . . . . . . . . . . . . . . . 113
x
5.2 Empirical bias and mean squared error of Vej j = 1 : : : 6 for uncor-
related white noise processes . . . . . . . . . . . . . . . . . . . . . . . 128
5.3 Empirical bias and mean squared error of Vej j = 1 : : : 6 for white
noise processes which are related via linear regression with delay . . . 130
6.1 Results of testing the Nile River minimum water levels for homogeneity
of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
xi
ACKNOWLEDGMENTS
I would like to thank those people most directly involved with this disser-
tation. My two principal advisors, Professors Peter Guttorp and Don Percival,
guided me and never stopped demanding a high level of my understanding and
of my work. I would also like to thank the other members of my committee:
Professors Paul Sampson, Chris Bretherton, and Stephen Majeski.
I would especially like to thank my parents, Dona Farsdahl and Den-
nis Whitcher. Their willingness to provide me with every resource possible
throughout my life in order to succeed is the reason why I am completing this
degree.
Finally, I would like to thank my fellow graduate students for good times,
stimulating conversations and unparalleled drinking.
xii
Chapter 1
INTRODUCTION
1.1 Motivation
The analysis of time series has often been dicult when data do not conform to well-
studied theoretical concepts. One of the most common statistical properties violated
by time series data is stationarity. A time series is considered (weakly or second-order)
stationary when it has a mean and autocovariance sequence that do not vary with
time. It is not uncommon to encounter departures from stationarity in recorded time
series from the physical sciences, e.g., atmospheric science. There, seasonal eects
are not limited to the mean of a time series, but may also enter into the variance.
Some atmospheric variables are known, for instance, to exhibit increased variability
in the winter of each year. Other time series exhibit a persistence of correlation much
longer than can be explained by short memory (ARIMA) models they are known
as long memory processes. The existence of data, such as these, that defy current
statistical methods motivates researchers to develop better theories and better tools
with which to analyze them. In this dissertation I present statistical techniques
that can be useful for detecting and evaluating nonstationary events in univariate
or bivariate time series. A complicating factor in many situations is the presence of
slowly decaying autocorrelations, or long memory, in a time series. The techniques
presented here are shown to perform well whether short memory or long memory
structure is assumed.
Another concept which arises in the physical sciences is the notion of `multiscale
2
features.' That is, an observed time series may contain several phenomena, each
occurring in dierent time scales (these correspond to ranges of frequencies in the
Fourier domain). An example in atmospheric science would be that weather has a
very short time scale, around three days, while seasonal patterns occur around 365
days when measured at a single station away from the equator. Wavelet techniques
possess a natural ability to decompose time series into several sub-series which may
be associated with particular time scales. Hence, interpretation of features in complex
atmospheric time series may be alleviated by rst applying a wavelet transform and
subsequently interpreting each individual sub-series.
This dissertation grew out of a project to investigate atmospheric phenomena, such
as the Madden{Julian oscillation (MJO) (Madden and Julian 1971), using wavelet
techniques. While developing sound statistical quantities and tests, I have also tried
to keep in mind their application to relevant scienti c questions and interpretability.
researchers have investigated detecting and locating not single changes of variance,
but multiple changes. Techniques used include a cumulative sum of squares method
(Inclan and Tiao 1994) and an information criterion method (Chen and Gupta 1997).
Second, the discrete wavelet transform (DWT) has been shown to approximately
decorrelate time series with long memory structure see, for example, Tew k and
Kim (1992), McCoy and Walden (1996) and Wornell (1996). In fact, the DWT of a
long memory process produces several sub-series which are approximately white noise
sequences. Features which dier from this long memory structure, such as sudden
changes of variance, are retained in certain sub-series of wavelet coecients.
We take advantage of this approximate \decorrelation" of the DWT and the sim-
plicity of the cumulative sum of squares method to test for homogeneity of variance,
on a scale by scale basis, of long memory processes in Chapter 4. This provides a
statistically sound technique of testing for nonstationary features without knowing
the exact nature of the correlation structure in a given time series. The methodol-
ogy developed in Chapter 4 is applied to the minimum water levels of the Nile River
(Toussoun 1925) in Section 6.1, a time series known to exhibit long-range dependence
(Mohr 1981 Graf 1983 Beran 1994, p. 22). I also analyze measurements of vertical
ocean shear (Percival and Guttorp 1994) in Section 6.2. While this series does not
appear to exhibit long memory structure, it is a good application of the multiple
change point testing procedure where exact knowledge of the underlying spectrum is
not required. The residual correlation in the wavelet coecients of both short and
long memory processes is investigated in Section 4.1.
from 30{60 days and has appeared in many studies in the Indian Ocean and tropical
Paci c Ocean see, e.g., Madden and Julian (1994) for a review. This apparent
broadband nature of the oscillation has been hypothesized as being nonstationary,
so the broad peak observed in previous spectral analyses might be attributed to
the fading in-and-out of the oscillation over the time series of measurements. The
inuence of El Ni~no{Southern Oscillation (ENSO) events has also been hypothesized
to aect the period of the MJO (Gray 1988 Kuhnel 1989). The ability of the wavelet
transform to capture variability in both time and scale may provide insight into the
nature of atmospheric phenomena such as the MJO, but rst bivariate techniques
must be developed.
Wavelet methods for time series analysis have been performed primarily on uni-
variate processes { with the following exceptions. There has been some work in the
eld of turbulence { in a thesis by Hudgins (1992) and subsequent paper by Hud-
gins, Friehe, and Mayer (1993). Hudgins used the output from wavelet transforms
to measure association between turbulent velocity components in the atmosphere.
A few articles also appear in the engineering literature from Japan. Kawata and
Arimoto (1996) were interested in signal matching for pattern recognition problems,
and Li and Nozaki (1997) used the wavelet cross-correlation of two velocity signals
in order to reveal similar structures on a scale by scale basis at particular delays and
times. Recently, Torrence and Compo (1998) discuss the cross-wavelet spectrum,
which is complex valued, and the cross-wavelet power, which is simply the magni-
tude of their cross-wavelet spectrum. They also introduce con dence intervals for
their cross-wavelet power and compare the Southern Oscillation Index (SOI) with
the Ni~no3 sea surface temperature (SST). Both time series are measures of ENSO
activity the SOI is de ned to be seasonally averaged pressure dierence between
Darwin, Australia, and Tahiti, French Polynesia, and the Ni~no3 SST is the seasonal
SST averaged over the central Paci c (5 S{5 N, 90 {150 W).
The articles discussed above solely utilized the continuous wavelet transform.
5
Lindsay, Percival, and Rothrock (1996) de ned the sample wavelet covariance for
the DWT and maximal overlap DWT (a redundant version of the DWT), along with
con dence intervals based on large sample results. These methods were applied to
the surface temperature and albedo of ice pack in the Beaufort Sea, o the coast of
Alaska and the Northwest Territory.
I introduce the wavelet covariance and correlation in Chapter 5, establishing their
asymptotic distributions for certain Gaussian processes. The wavelet covariance is
shown to decompose the covariance between two stationary processes on a scale by
scale basis. The wavelet cross-covariance and cross-correlation are also de ned in
order to perform a more thorough scale by scale analysis of bivariate time series.
The same time series used by Madden and Julian (1971) are analyzed using bivariate
wavelet techniques and multitaper spectral methods in Section 6.3. A daily Southern
Oscillation Index is used as an indicator of ENSO activity and compared with the
station pressure at Truk Island (7.4 N, 151.8 W) in order to investigate the possible
relationship between ENSO events and the MJO (Section 6.4).
1.3 Contributions
The following is a list of original contributions in this dissertation:
Proof that the wavelet covariance decomposes the covariance between two sta-
tionary processes on a scale by scale basis (Section 5.1.1).
This allows for the construction of con dence intervals when estimating the
wavelet covariance.
Demonstration that the lack of shift invariance of the DWT introduces bias into
the variance of the DWT estimator of the wavelet covariance (Section 5.2.2).
Demonstration of evidence for a change in the variance of the Nile River min-
imum water levels (Toussoun 1925) instead of a change in the long memory
parameter { as proposed in Beran and Terrin (1996) (Section 6.1).
Investigation of the possible interaction between ENSO events and the MJO us-
ing a wavelet analysis of covariance developed in this dissertation (Section 6.4).
Chapter 2
LONG MEMORY PROCESSES
Our current understanding, and more importantly awareness, that natural phe-
nomena may exhibit long-range dependence is due to the pioneering work by Hurst
(1951). While looking at time series from the physical sciences (e.g., rainfall, tree
rings, river levels, etc.) he noticed that his R=S -statistic, on a logarithmic scale, was
randomly scattered around a line with slope H > 21 for large sample sizes. The R=S -
statistic is the rescaled adjusted range and was used to calculate the ideal capacity of
a water reservoir from time t to time t + k. Loosely, the numerator R (or adjusted
range) measures the cumulative inow to the reservoir and the denominator S is pro-
portional to the standard deviation of all measured inows. For a stationary process
with short-term dependence, log R=S should be proportional to k 21 , for k large. The
discovery of slopes proportional to kH , with H > 21 , was in direct contradiction to
the theory of such processes at the time. This discovery is known as the Hurst eect.
Mandelbrot and co-workers (Mandelbrot and van Ness 1968 Mandelbrot and Wal-
lis 1969) showed that the Hurst eect can be modeled by fractional Gaussian noise
with self-similarity parameter 0 < H < 1 (H being for Hurst). More information
about the history of long memory processes can be found in Beran (1994). Examples
of such behavior can be found in a variety of disciplines, such as geophysics (Percival
and Guttorp 1994 Walden 1994), hydrology (Lawrence and Kottegoda 1977 Hosking
1984), economics (Jensen 1994) and engineering (Mehrabi, Rassamdana, and Sahimi
1997 Abry and Veitch 1998). In this dissertation, I look at the Nile River mini-
mum water levels (Toussoun 1925) and vertical shear measurements in the ocean in
Chapter 6.
10
This chapter is divided into two parts, fractional dierence processes and general-
ized fractional dierence processes. The latter is a generalization of the former where
the dierence parameter d is allowed to vary with time. Along with brief descriptions,
simulation techniques for both types of processes are also provided.
2.1.2 Simulation
Davies and Harte (1987) describe a method for simulating certain stationary Gaussian
time series of length N with autocovariances 0 1 : : : N ;1. The method is based
on the Fourier transform and is as follows (Beran 1994, pp. 216{217):
1. De ne
k 22(nk;;21)
k = 1 : : : 2n ; 2, and the discrete Fourier transform ;k of the two-sided
sequence of autocovariances 0 1 : : : n;2 n;1 n;2 : : : 1,
X
n;1 X
2n;2
;k j;1 ei(j;1)k + 2n;j;1 ei(j;1)
k
(2.4)
j =1 j =n
for k = 1 : : : 2n ; 2.
2. Check to see that ;k > 0 for all k = 1 : : : 2n ; 2. If this condition does not
hold, the Davies{Harte method will not work for this time series (this is not a
problem with fractional dierence processes).
12
20
d = 0.05
d = 0.25
d = 0.40
d = 0.45
15
10
dB
Frequency
Figure 2.1: Spectral densities for fractional dierence processes. The x-axis is dis-
played on the log2 scale.
for k = n + 1 : : : 2n ; 2.
4. For t = 1 : : : n de ne
Xp
2n;2
Xt p 1 ;k ei(t;1) Zk :
k
(2.5)
2 n;1 k=1
The series fXtg has the desired covariance structure. This method has a compu-
tational advantage since Equations (2.4) and (2.5) can be calculated using the fast
Fourier transform. Percival (1992) compares this method to others in the context of
generating a stationary Gaussian process with speci ed spectrum.
S-plus code for the Davies{Harte method, along with documentation provided by
Martin Maechler and Jan Beran, can be obtained via the World Wide Web from
StatLib at http://lib.stat.cmu.edu/S/ under the title beran. Realizations of
length 512 from several fractional dierence processes (generated in S-plus) are dis-
played in Figure 2.2. As the long memory parameter increases in magnitude, the
fractional dierence process appears to have more and more low frequency content.
A process de ned by Equation (2.3) has a long memory parameter which is constant
over time. We introduce a related process where the long memory parameter dt is
a discrete function of time { called a generalized fractional dierence process (gfdp).
This process has recently appeared in a paper by Wang, Cavanaugh, and Song (1997).
We will utilize these processes later on when we investigate how the test for homo-
geneity of variance reacts to a sudden change in the long memory parameter of a
generalized fractional dierence process.
14
d: 0.45
-2
-4
d: 0.40
-2
-4
d: 0.25
-2
-4
d: 0.05
-2
-4
2.2.2 Simulation
Hosking (1981) looked at representing a fractional ARIMA as an in nite autoregres-
sive process or in nite moving average process with coecients which may be given
explicitly see also Beran (1994, pp. 64{65). We utilize the in nite moving average
representation in order to simulate generalized fractional dierence processes. Let
fXt g be a generalize fractional dierence process with long memory parameter fdtg,
then it has an in nite moving average representation
X
1
Xt = atk t+N ;k t = 1 : : : N
k=0
where k t = 1 2 : : : , is a white noise sequence and
by Stirling's formula.
We now provide an algorithm for simulating such a process. For a realization
Xt t = 1 : : : N , of a portion of a generalized fractional dierence process, a white
noise sequence t t = 1 : : : mN is generated. The parameter m > 1 is a positive
integer that determines the order of the moving-average model used to generate the
realization Xt. When simulating generalized fractional dierence processes in this
dissertation, I used m = 2. Once the length of previous observations is speci ed, each
observation Xt is simply the moving average of the previous (m ; 1)N observations
i.e.,
X
(m;1)N ;1
Xt = atk t;k t = 1 : : : N: (2.7)
k=0
16
The coecients atk are functions of the time-varying long memory sequence fdtg and
can be de ned recursively via
d: 0.40 d: 0.45
2.0
3.5
3.0
1.5
2.5
2.0
1.0
1.5
1.0
0.5
Autocovariance
0 5 10 15 20 25 30 0 5 10 15 20 25 30
d: 0.05 d: 0.25
1.2
1.0
Exact
q = 512
1.0
q = 1024
0.8
q = 2048
q = 4096
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
lag
d: 0.40 d: 0.45
2.5
5
2.0
4
1.5
3
1.0
2
Autocovariance
0 5 10 15 20 25 30 0 5 10 15 20 25 30
d: 0.05 d: 0.25
1.2
1.0
Exact
q = 512
q = 1024
1.0
0.8
q = 2048
q = 4096
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
lag
Figure 2.4: Autocovariance sequences for MA(q) approximations, using the modi ed
innovations variance ^ 2, to fractional dierence processes, with orders ranging from
q = 512 to q = 4096.
19
Hence, inserting ^ 2 into Equation (2.8) will produce an improved, in the least-
squares sense, autocovariance sequence. Simulation of fractional dierence processes
is straightforward, by generating sequences of white noise with variance ^ 2 and uti-
lizing Equation (2.7), where fdtg does not vary with time. For generalized fractional
dierence processes, sequences of white noise with unit variance are generated and
the innovations standard deviation ^t enters into Equation (2.7) and varies with
fdt g.
This procedure was applied to the autocovariance sequences displayed in Fig-
ure 2.3, using only the rst 30 lags in the regression, with the results shown in
Figure 2.4. There is no change in the autocovariance sequences when d = 0:05
or 0.25, these approximations were adequate without modi cation even for q = 512.
A marked improvement is seen for d = 0:40, where the approximation at initial lags is
only slightly higher and, for larger lags, is not nearly as low as before. This pattern is
even more apparent for d = 0:45, where the new autocovariance sequences do not t
the exact acvs well, but better than in Figure 2.3. Whereas modifying the innovations
variance improves this method of simulation for a larger interval of d, as d ! 0:5,
any nite moving-average model must have an extremely large order to adequately
capture the amount of correlation structure present in a fractional dierence process.
Figure 2.5 shows realizations of four generalized fractional dierence processes
(N = 512), where m = 2 and the modi ed innovations variance ^ 2 is utilized, with
a sudden shift in the long memory parameter at t = N2 . Whereas we are only in-
terested in generating fractional dierence processes with these sudden changes in
the dierence parameter, this method is able to produce processes with parameters
20
d: 0.05 / 0.45
-2
-4
d: 0.05 / 0.40
-2
-4
d: 0.05 / 0.25
-2
-4
d: 0.05 / 0.05
-2
-4
which change linearly or even nonlinearly with time. Examples of such processes can
be found in Wang et al. (1997).
Chapter 3
most { if not all { are from an engineering perspective, except for the recent work of
Ogden (1997). Chui (1997) provides a basic synopsis of wavelet theory, while Vetterli
and Kova%cevic (1995) gives a more thorough account from an engineering perspective.
The mathematically rigorous book by Daubechies (1992) contains a wealth of details
on her families of wavelet lters.
By utilizing notation and concepts from the rst two appendices, we introduce the
Haar wavelet lter and two families of Daubechies wavelets in Section 3.1. The partial
discrete wavelet transform (DWT) and maximal overlap discrete wavelet transform
(MODWT) are briey introduced. Algorithms for the DWT abound in the litera-
ture for a detailed computer algorithm of the MODWT see Appendix A of Percival
and Mofjeld (1997). The bulk of the background material on wavelets, DWT and
MODWT (including notation) is a synopsis derived from Percival and Walden (1999).
The wavelet variance is introduced along with the concept of equivalent degrees of
freedom for a time series. The distribution of the wavelet variance under the equiv-
alent degrees of freedom argument is compared with exact methods from the theory
of quadratic forms of normal random variables.
The rst wavelet lter, the Haar wavelet (Haar 1910), remained in relative obscurity
until the convergence of several disciplines to form what we now know in a broad
sense as wavelet methodology. It is a lter of width L = 2 which can be succinctly
de ned by its scaling coecients
g0 = g1 = p1
2
24
p p
or equivalently by its wavelet coecients h0 = 1= 2 and h1 = ;1= 2 through the
quadrature mirror relationship
in the future, `D(L)' and `LA(L)' will be used to denote Daubechies extremal phase
and least asymmetric wavelet lters of length L, respectively. The D(2) wavelet is
equivalent to the Haar wavelet. In general, let
X
L ;1 X
L;1
H (f ) hte ;i2ft
and G(f ) gte;i2ft
t=0 t=0
de ne the transfer function for the wavelet and scaling coecients, respectively. Re-
call that any arbitrary transfer function may be factored into the product of its
magnitude component and a complex exponential containing the phase component
(cf. Section A.3). The D(L) and LA(L) wavelet lters are identical in the magnitude
of their transfer functions, only diering in their phase properties. The manipula-
tion of these phase properties is known as spectral factorization (Percival and Walden
1999, Sec. 4.8).
Wavelet lter coecients for the D(4) wavelet, at unit scale, are de ned to be
p p p p
h0 = 1 ;p 3 h1 = ;3 p
+ 3 h = ;3 p
2
+ 3 and h = ;1 p
3
; 3
4 2 4 2 4 2 4 2
and the scaling coecients for the LA(8) are given in Table 3.1. Recall that the scaling
lter is related to the wavelet lter via the quadrature mirror lter relationship given
by Equation (3.1). The scaling coecients de ning Daubechies families of wavelet
lters of varying lengths can be found in Daubechies (1992, Ch. 6).
More information about the properties of these wavelets can be seen when com-
paring the squared gain functions of the wavelet and scaling coecients
Table 3.1: Scaling coecients for the Daubechies least asymmetric wavelet lter of
length L = 8, taken from Percival and Walden (1999, Sec. 4.4).
g0 = ;0:0757657147893407
g1 = ;0:0296355276459541
g2 = 0:4976186676324578
g3 = 0:8037387518052163
g4 = 0:2978577956055422
g5 = ;0:0992195435769354
g6 = ;0:0126039672622612
g7 = 0:0322231006040713
The orthonormality (Equation (3.2)) and orthogonality to its even shifts (Equa-
tion (3.3)) seen for the Haar wavelet lter, and shared by both Daubechies families of
wavelet lters used here, can be succinctly expressed using the squared gain function
of the wavelet lter via
H(f ) + H ;f + 12 = 2 for all f: (3.5)
To illustrate these properties, we can show the Haar wavelet lter satis es Equa-
tion (3.5) since
X
1 p
hle;i2fl = 1 ;pe
;i2f
H (Haar)(f ) = = i 2e;if sin(f )
l=0 2
and therefore the squared gain function is H(Haar)(f ) = 2 sin2(f ). Using the rela-
;
tionship that cos(f ) = sin f + 12 , we have
;
H(Haar)(f ) + H(Haar) f + 21 = 2 sin2(f ) + 2 cos2(f ) = 2:
Alternative ways of expressing Equation (3.5), say, using the squared gain function
for the scaling coecients or combinations between the two, are
G (f ) + G ;f + 21 = 2 or G (f ) + H(f ) = 2 for all f
27
and follow from the fact that they both have unit period and their quadrature mirror
relationship (Equation (3.4)).
Now, the discrete wavelet transform (DWT) can be thought of as a sequence
of ltering operations which form a cascade of lters (cf. Section A.3). The low-
pass output from one ltering operation fXt
gl g is the input to the next ltering
operation where the lter is an upsampled version of the original lter. Upsampling
consists of inserting one zero between each of the elements of fhlg to form fh"l g
fh0 0 h1 0 : : : hL;2 0 hL;1g see, e.g., (Vetterli and Kova%cevic 1995, Sec. 2.5.3) or
Percival and Walden (1999, Sec. 4.4). The transfer function for fh"l g is
X
2L;2 X
L;1 X
L;1
H "(f ) = h"l e;i2fl = h"2le;i2f (2l) = hle;i2(2f )l = H (2f )
l=0 l=0 l=0
since every other element of fh"l g is zero. Using Equation (A.4), the transfer func-
tion for the second level wavelet lter fh2lg is H2(f ) H (2f )G(f ). By a simi-
lar argument, the transfer function for the second level scaling lter fg2lg is deter-
mined by convolving fgl g with fgl"g fg0 0 g1 0 : : : gL;2 0 gL;1 g and is therefore
G2 (f ) G(2f )G(f ). This method can be extended to an arbitrary level j , by re-
peatedly upsampling the lters and applying Equation (A.4), yielding the following
expressions for the transfer functions of the wavelet and scaling lters, respectively,
Y
j ;2 Y
j ;1
Hj (f ) H (2j;1 f ) G(2l f ) and Gj (f ) G(2lf ): (3.6)
l=0 l=0
Intuitively, a vector of wavelet coecients for level j is composed of j ; 1 applications
of a low-pass lter (or averaging operator) followed by one application of a high-pass
lter (or dierence operator), and a vector of scaling coecients is obtained from j
applications of the low-pass lter.
Figure 3.1 shows some of the common wavelets, or more speci cally, wavelet basis
vectors taken from the sixth level of the transform. As the length of the wavelet lter
increases, the smoothness of the basis function increases. However, the increased
28
Haar
0.2
0.1
0.0
-0.1
D(4)
0.2
0.1
0.0
-0.1
LA(8)
0.2
0.1
0.0
-0.1
Figure 3.1: The Haar, D(4) and LA(8) wavelet lters for level 6 (N = 512).
29
length, while improving the lters' approximation to an ideal band-pass lter, am-
pli es boundary eects encountered whenever nite time series are analyzed. This
is an important feature to realize in practical situations where data may be at a
premium. From the gure, the Haar wavelet lter is a simple square-wave function,
the D(4) is quite jagged with a self-similar or fractal-like appearance to it and the
LA(8) is reasonably smooth and quite close to symmetric. When selecting a wavelet
lter, several factors must be taken into account, such as, boundary eects, leakage
protection, etc. Most importantly, the wavelet lter should agree with the underlying
structure of the physical process it is analyzing.
The squared gain functions of the wavelet and scaling lter coecients for the
Haar, D(4) and LA(8) wavelets are given in Figure 3.2. For comparison, the vertical
dotted lines indicate the passband of frequencies for an ideal band-pass lter. The rst
column in the gure shows the squared gain functions for the unit scale wavelet lters.
As the length of the lter increases, from Haar (L = 2) to LA(8), the approximation
to an ideal high-pass lter for 41 < f < 12 by the wavelet coecients improves as does
the approximation to an ideal low-pass lter by the scaling coecients. The Haar
wavelet lter is seen to be a poor approximation to an ideal band-pass lter for all
scales shown.
Another interesting feature to point out is the leakage of the shorter wavelet lters.
Because a high portion of low frequencies is being captured in each scale, one may
observe a fair amount of low frequency structure at smaller scales see, e.g., Percival
and Guttorp (1994). This is due to the poor approximation to an ideal band-pass
lter by the analyzing wavelet. However, unlike spectral analysis, where leakage
must be dealt with using tapering or other pre-processing of the data, an easy way
to eliminate (or at least suppress) leakage is to increase the length of the wavelet
lter. In practice, it is a good idea to perform a wavelet decomposition using lters
of varying lengths, in order to determine if leakage is present.
30
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
0.8
0.6
0.4
0.2
0.0
D(4) D(4) D(4) D(4)
Level: 1 Level: 2 Level: 3 Level: 4
1.0
0.8
0.6
0.4
0.2
0.0
LA(8) LA(8) LA(8) LA(8)
Level: 1 Level: 2 Level: 3 Level: 4
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Frequency
Figure 3.2: Squared gain functions for the Haar, D(4) and LA(8) wavelet lters
associated with scales 2j ; 1 j = 1 : : : 4. The solid line denotes the wavelet lter
and the dashed line denotes the scaling lter while the vertical dotted lines denote
the frequency bands for an ideal band-pass lter.
31
l=0
(cf. Equation (3.6)). The resulting wavelet lter associated with scale j has length
Lj (2j ; 1)(L ; 1) + 1. Also, de ne the scaling lter fgJ g for scale J as the inverse
DFT of
Y
J ;1
GJk = G12 k mod N k = 0 : : : N ; 1
l
l=0
32
where the non-zero wavelet lter coecients are in reverse order. Constructing a
matrix from all possible circular shifts, at a particular scale j , of Equation (3.7)
yields the sub-matrix Wj . This allows us to think of the orthonormal matrix W
being comprised of several sub-matrices, each one stacked on top of the other, such
that W = #W1 : : : WJ VJ ]T . For example, when L = 4 and N > 4 we get
2 3
66 h11 h10 0 0 0 0 0 0 0 0 0 h13 h12 7
66 h13 h12 h11 h10 0 0 0 0 0 0 0 0 0 777
66 0 0 h13 h11 h11 h10 0 0 0 0 0 0 0 777
W1 = 66 .. .. .. .. .. .. . . . ... ... ... ... ... ... ... 77
66 . . . . . . 77
66 0 0 0 0 0 0 0 h13 h12 h11 h10 0 0 75
4
0 0 0 0 0 0 0 0 0 h13 h12 h11 h10
(3.8)
where W1 is a N=2 N matrix whose rows are h1 circularly shifted by 2m ; 1 for
m = 1 : : : N=2. The remaining sub-matrices W2 : : : WJ are de ned similarly to
Equation (3.8), being shifted by 2j m ; 1 for m = 1 : : : N=2j , and VJ is identical
in dimension to WJ but contains circularly shifted versions of gJ , instead of hJ , by
2J m ; 1 for m = 1 : : : N=2J . In practice, the rows of the matrix W are not explicitly
constructed, but instead the DWT is implemented via a pyramid algorithm (Mallat
1989) that applies wavelet coecients to the input series and subsamples the output
one scale at a time.
33
the DWT does not possess. It does this by not subsampling the ltered output at
each scale. A consequence of this is the wavelet and scaling coecients must be
rescaled in order to retain the energy preserving property of the DWT.
The following properties are important in distinguishing the MODWT from the
DWT (Percival and Mofjeld 1997):
1. The MODWT can handle any sample size N , while the J th order partial DWT
restricts the sample size to a multiple of 2J .
3. The MODWT is invariant to circularly shifting the original time series. Hence,
shifting the time series by a given amount will simply shift the MODWT wavelet
and scaling coecients the same amount. This property simply does not hold
for the DWT.
4. While both the DWT and MODWT can perform an analysis of variance on
a time series, the MODWT wavelet variance estimator is asymptotically more
ecient than the same estimator based on the DWT (Percival 1995).
The transform goes by several names in the statistical and engineering literature, such
as, the stationary DWT and translation-invariant DWT see Percival and Mofjeld
(1997) for more details.
3.3.2 Denition
The brief introduction presented here follows from Percival and Mofjeld (1997). A
thorough discussion of the MODWT will appear in Percival and Walden (1999, Ch. 5).
35
The notation follows from the DWT, with the J th order partial MODWT being
de ned by W f=W fX, where W f is composed of J +1 length N vectors, W
f 1 : : : W
fJ
and Ve J , which can be arranged in the following manner
f hf f f e iT
W W1 W2 WJ VJ :
The vector of wavelet coecients W f j is associated with changes of length j = 2j;1
e J is associated with averages of lengths j and higher. For time series of dyadic
and V
length, the MODWT can be subsampled and rescaled to obtain the DWT.
Similar to the matrix W for the DWT, the matrix W f is also made up of J +1 sub-
matrices, each of them N N , and may be expressed as W f = hWf1 : : : WfJ VeJ iT .
In this case, when L = 4 and N > 4, we have
2 3
~h10 0 0 0 0 0 0 0 0 0 h~ 13 ~h12 ~h11
66 77
66 h11 h10 0 0 0 0 0 0 0 0 0 h13 h12 77
~ ~ ~ ~
66 ~h12 ~h11 h~ 10 0 0 0 0 0 0 0 0 0 ~h13 77
66 7
66 ~h13 ~h12 h~ 11 h~ 10 0 0 0 0 0 0 0 0 0 777
66 0 ~h13 h~ 12 h~ 11 ~h10 0 0 0 0 0 0 0 0 77
f1 = 66
W 7
66 0 0 h~ 13 h~ 12 ~h11 ~h10 0 0 0 0 0 0 0 777
66 ... ... ... ... ... ... . . . ... ... ... ... ... ... ... 77
66 77
66 0 0 0 0 0 0 0 h13 h12 h11 h10 0 0 77
~ ~ ~ ~
66 0 0 0 0 0 0 0 0 ~h13 h~ 12 h~ 11 ~h10 0 77
4 5
~
0 0 0 0 0 0 0 0 0 h13 h12 h11 h10 ~ ~ ~
(3.9)
where W f1 is a N N matrix, and the rows of the matrix h~1 = h1=21=2 are simply
the rescaled wavelet lter coecients circularly shifted by m ; 1 for m = 1 : : : N .
In general, let h~j hj =2j=2 and g~J gJ =2J=2 be, respectively, the rescaled wavelet
and scaling lter coecients required to construct W f. The remaining sub-matrices
f2 : : : WfJ are constructed similarly to Equation (3.9) and VeJ has the same struc-
W
ture as W fJ only using circularly shifted scaling coecients instead of the wavelet
36
coecients. Circular shifting for all scales is identical to that of Equation (3.9). In
practice, a pyramid scheme is utilized similar to that of the DWT see, e.g., Percival
and Walden (1999, Sec. 5.4).
Percival (1995) showed that the wavelet variance decomposes the variance of fXtg
on a scale by scale basis i.e.,
X
1
X2 (j ) = VarfXtg:
j =1
This is analogous to the spectral density function (Equation (B.2)) which decomposes
the variance of fXtg on a frequency by frequency basis.
We can form an unbiased estimator of the wavelet variance based upon the
MODWT using
1 NX;1 f2
~X2 (j ) e W (3.11)
Nj l=L ;1 jt
j
which yields the result in Equation (3.10). The unbiased estimator based on the
DWT is given by
NX;1
^X2 (j ) b1 j
Wjt2 (3.12)
2j Nj l=L0 j
where Nj = N=2j , Nbj = Nj ; L0j and L0j = d(L ; 2)(1 ; 2;j )e. We utilize the wavelet
variance not only when de ning the wavelet correlation between two time series, but
also in Chapter 6 when analyzing time series from the physical sciences.
ordinates see, for example, Priestley (1981, pp. 466{468), Brillinger (1981, pp. 145{
146) and Percival and Walden (1993, Sec. 6.10). The equivalent degrees of freedom
concept has been used, for example, in estimating the statistical bandwidth of a time
series (Walden and White 1990) and establishing con dence intervals for the wavelet
variance based on the MODWT (Percival 1995).
Let fWfjtg be a vector of MODWT wavelet coecients associated with scale j for
a real valued Gaussian process fXt g whose dth order backward dierence is stationary,
and let L > 2d. We de ne fX (j )g to be the known autocovariance sequence of
fjtg. If we assume the MODWT estimator of wavelet variance ~X2 (j ) can be
fW
approximated via
~X2 (j ) =d bj
2j
(3.13)
d
(Rice 1945), where \=" means equal in distribution, we obtain
e 4
j = PNe ;1 Nj X (jj)j (3.14)
j
=;(Ne ;1) 1
j
; Ne 2 (j )
Xj
Table 3.2: Equivalent degrees of freedom for the MODWT of white noise (N = 512)
using the Haar, D(4), LA(8), and LA(16) wavelet lters. The numbers in parentheses
are Nej , the number of MODWT wavelet coecients unaected by the boundary
conditions.
Level
Wavelet 1 2 3 4 5 6
Haar 341:8 (511) 293:3 (509) 179:0 (505) 95:0 (497) 48:6 (481) 24:8 (449)
D(4) 312:6 (509) 244:1 (503) 129:9 (491) 65:6 (467) 33:2 (419) 16:9 (323)
LA(8) 293:9 (505) 204:0 (491) 103:1 (463) 51:9 (407) 26:3 (295) 13:5 (71)
If Lj 2, then the ratio must be smaller than 1 and thus, any application of the
40
Table 3.3: Large sample approximation to the ratio of equivalent degrees of freedom
j =Nej .
Level
Wavelet 1 2 3 4 5 6
Haar 0:6667 0:5714 0:3478 0:1839 0:0933 0:0468
D(4) 0:6095 0:4752 0:2521 0:1266 0:0633 0:0317
LA(8) 0:5730 0:3968 0:1999 0:0999 0:0500 0:0250
MODWT will result in a decrease in the eective sample size. Table 3.3 gives the large
sample approximation to the ratio j =Nej for MODWT wavelet coecients applied
to white noise. Comparing the results from Table 3.2 for a sample size of N = 512,
the large sample approximation gives 1 = 341:3 versus 341.8 (a dierence of 21 df)
for the unit scale Haar wavelet lter. The dierence between the two methods is less
than 1 df for all other scales and wavelet lters. We therefore recommend the large
sample approximation for moderate to large sample sizes in practice, keeping in mind
the estimate will be slightly conservative.
To see how well Equation (3.13) holds for MODWT wavelet coecients, a small
Monte Carlo study (100 iterations) was performed to simulate the MODWT wavelet
variance ~X2 (j ) and compare them to appropriate
2 distributions. Sequences of
Gaussian white noise (N = 512) were simulated, and a partial MODWT (J = 6),
using the Haar wavelet lter, was applied to them. Wavelet coecients aected by
the boundary were discarded and the variance for each scale calculated. Figure 3.3
shows quantile-quantile plots for the estimated variances against
2 distributions
with degrees of freedom given in the rst row of Table 3.2. Even for the higher scales,
where there are relatively few degrees of freedom involved, the MODWT estimates of
wavelet variance ~X2 (j ) appear to follow the approximation given in Equation (3.13).
41
Level: 5 Level: 6
80
40
70
60
30
50
40
20
30
30 40 50 60 70 10 20 30 40
Level: 3 Level: 4
140
Rescaled Wavelet Variance
240
220
120
200
100
180
160
80
140
Level: 1 Level: 2
360
340
400
320
300
350
280
260
300
240
280 300 320 340 360 380 400 240 260 280 300 320 340 360
Figure 3.3: Quantile-quantile plots for the MODWT wavelet variance, using the Haar
wavelet lter, against a
2 distribution with degrees of freedom taken from the rst
row of Table 3.2. The wavelet variance has been adjusted at each scale in order to
more easily compare it with the
2 distribution.
42
The exact distribution of the MODWT wavelet variance ~X2 (j ) can be found using
the theory of quadratic forms of normal (Gaussian) random variables. Percival (1983,
Sec. 2.5) investigated this for the Allan variance (Allan 1966), which is proportional
to the wavelet variance using the Haar wavelet lter. Let us de ne
P fjl2
ej ~X2 (j ) Nl=;L1;1 W
N
Q 2 ( ) = 2 ( ) j
(3.15)
X j X j
to be the quadratic form of interest. Now, we may rewrite Equation (3.15) as
X
Nej
Q= iUi2
i=1
where fUi2g are independent
2 random variables with 1 degree of freedom each and
fig are the eigenvalues of the autocorrelation matrix B see, e.g., Johnson and Kotz
(1970, Ch. 29). The matrix B is band diagonal and computed by dividing the inverse
DFT of the squared gain function for scale j (i.e., the autocovariance sequence
fX (j )g) by the wavelet variance X2 (j ). To evaluate the distribution function of
Q we utilize the method of Imhof (1961), where the characteristic function of Q is
numerically inverted.
The exact distribution for Q was obtained from a combination of S-Plus and
FORTRAN code graciously provided by Professor R. Lockhart. We are not interested
in the distribution of Q per se, but instead are interested in the distribution of the
wavelet variance. This is easily obtained from the software using Equation (3.15) to
obtain
(e 2 ) ( )
N j ~X (j )
P fQ qpg = P 2 ( ) qp = P ~X2 (j ) X e 2 (j )qp
= p
X j Nj
where qp is the pth quantile of the distribution of Q. For comparison, we note the
corresponding distribution of the wavelet variance assuming the equivalent degrees of
freedom argument (Equation (3.13)) is given by
P ~X2 (j ) bj
p = p
43
Level: 5 Level: 6
1.0
0.8
0.6
0.4
0.2
0.0 0.02 0.04 0.06 0.08 0.10 0.0 0.02 0.04 0.06 0.08
Level: 3 Level: 4
Cumulative Distribution Function
1.0
0.8
0.6
0.4
0.2
0.0 0.05 0.10 0.15 0.20 0.25 0.0 0.05 0.10 0.15
Level: 1 Level: 2
1.0
0.8
0.6
0.4
0.2
0.0 0.2 0.4 0.6 0.8 1.00.0 0.1 0.2 0.3 0.4 0.5
Wavelet Variance
Figure 3.4a: Cumulative distribution functions for the MODWT wavelet variance,
using the Haar wavelet lter. A sample size of N = 128 was used, with equivalent
degrees of freedom given in each panel. The solid line uses the exact method while
the dotted line uses the equivalent degrees of freedom method.
44
Level: 5
1.0
0.8
Exact
0.6
EDOF
0.4
0.2
edof = 16.9
0.0
Level: 3 Level: 4
Cumulative Distribution Function
1.0
0.8
0.6
0.4
0.2
0.0 0.05 0.10 0.15 0.20 0.25 0.0 0.05 0.10 0.15
Level: 1 Level: 2
1.0
0.8
0.6
0.4
0.2
0.0 0.2 0.4 0.6 0.8 1.00.0 0.1 0.2 0.3 0.4 0.5
Wavelet Variance
Figure 3.4b: Cumulative distribution functions for the MODWT wavelet variance,
using the D(4) wavelet lter. A sample size of N = 256 was used, with equivalent
degrees of freedom given in each panel. The solid line uses the exact method while
the dotted line uses the equivalent degrees of freedom method.
45
Exact
EDOF
Level: 3 Level: 4
Cumulative Distribution Function
1.0
0.8
0.6
0.4
0.2
Level: 1 Level: 2
1.0
0.8
0.6
0.4
0.2
0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4
Wavelet Variance
Figure 3.4c: Cumulative distribution functions for the MODWT wavelet variance,
using the LA(8) wavelet lter. A sample size of N = 256 was used, with equivalent
degrees of freedom given in each panel. The solid line uses the exact method while
the dotted line uses the equivalent degrees of freedom method.
46
where
p is the pth quantile from a
2 distribution with j degrees of freedom. The
results for the Haar wavelet lter are given in Figure 3.4a with the D(4) and LA(8)
wavelet lters in Figures 3.4b{c, respectively. We see the distributions agree very well
for the smaller scales { the two curves are virtually on top of one another. However,
they begin to diverge slightly for higher scales (small equivalent degrees of freedom).
The software limited the maximum sample size that could be analyzed. This is the
reason for displaying fewer scales with respect to the D(4) and LA(8) wavelet lters.
Although the distributions based on the equivalent degrees of freedom argument
do not follow the true distribution for some scales, it is dicult to determine the
impact this would have when using the equivalent degrees of freedom argument to
modify hypothesis tests for homogeneity of variance. This point is explicitly inves-
tigated in Section 4.4.1, where the cumulative sum of squares test statistic formed
with MODWT wavelet coecients is adjusted using the equivalent degrees of freedom
on a scale by scale basis. Simulations are run to compare the empirical size of this
hypothesis testing procedure using asymptotic critical values from the DWT applied
to white noise.
Chapter 4
TESTING HOMOGENEITY OF VARIANCE
Suppose we have a time series that we are considering to model as a realization
of one portion Y1 : : : YN of a stationary Gaussian fractional dierence process fYtg
de ned by Equation (2.3). An important assumption behind any stationary process
is that its variance is a constant independent of the time index t. In the context of
short memory models, such as stationary autoregressive and moving average (ARMA)
processes, a number of tests have been proposed for homogeneity of variance. For
a time series consisting of either independent Gaussian random variables with zero
mean and possibly time-dependent variances t2 or a moving average of such variables,
Nuri and Herbst (1969) proposed to test the hypothesis that t2 is constant for all t
by using the periodogram of the squared random variables. Wichern, Miller, and Hsu
(1976) proposed a moving block procedure for detecting a single change of variance
at an unknown time point in an autoregressive model of order one. Hsu (1977, 1979)
studied the detection of a variance shift at a single unknown point in a sequence of
independent observations. Davis (1979) studied tests for a shift in the innovation
variance of an autoregressive process at a speci ed point. Abraham and Wei (1984)
used a Bayesian framework to study changes in the innovation variance of an ARMA
process. Tsay (1988) looked at detecting several types of disturbances in time series
{ among them variance changes { by analyzing the residuals from tting an ARMA
model. Srivastava (1993) found the cumulative sum of squares procedure to perform
better than the exponentially weighted moving average procedure for detecting an
increase in variance in white noise sequences. Inclan and Tiao (1994) investigated
the detection of multiple changes of variance in sequences of independent Gaussian
48
are presented when detecting single and multiple variance change points. The ability
of the cumulative sum of squares test statistic to detect a change in the long memory
parameter of a fractional dierence process is briey investigated. Applications can
be found in Section 6.1, where we analyze the annual minimum water levels of the
Nile River, and Section 6.2, where we investigate a series of measurements related to
vertical shear in the ocean.
(Anderson 1971, p. 388), and from Section A.2, we know that ltering a time series
corresponds to a multiplication of its spectrum with the squared gain function of the
50
is the spectrum of the DWT wavelet coecients for unit scale. For the class of
Daubechies wavelets, we have a closed form expression for H1(), namely,
X L=2 ; 1 + l 2l
L=2;1
H (D)
1 (f ) 2 sinL(f ) l cos (f ) = D (f )C (f )L
2 (4.3)
l=0
(Daubechies 1992, Ch. 6.1), where D(f ) j2 sin(f )j2 corresponds to a rst order
backward dierence lter and
1 L=X2;1
L=2 ; 1 + l cos2l(f ):
C (f ) 2L;1 l
l=0
Substituting Equations (2.3) and (4.3) into Equation (4.2) gives us
D L ; ; ; ;
1f C 1f S 1f + D 1f + 1 C 1f + 1 S 1f + 1
L ; ;
S1(D)(f )
2 2
= 2 2 2 2 2 2 2 2 2
; ; 2 ;
D ; ( d; ) 1
L
2 f C 1 f + D ;( d; ) 1 L
2f + 1 C ;1f + 1
= 2 2 2 2 2 2 :
2
The rst lter D;(d; 2 )() corresponds to a fractional dierence process with dierence
L
H(Haar)
1 (f )S (f ) = 21 j2 sin(f )j;2(d;1):
That is, the spectrum of the ltered process is proportional to a fractional dierence
process with parameter d0 = d ; 1. Since we were looking at so called \red" processes
with 0 < d < 21 , this means ;1 < d0 < ; 12 and hence the ltered process is \blue."
51
The colorful terminology comes from optics, where low frequencies of light are seen as
red and high frequencies seen as blue. The spectrum of the DWT wavelet coecients
using the Haar wavelet is
1 2 sin f ;2(d;1) + 2 cos f ;2(d;1) :
S1(Haar)(f ) = 4 2 2
When d = 0, fXtg is a white noise process and the spectrum of the DWT wavelet
coecients for unit scale is constant { as to be expected. Figure 4.1 shows the the-
oretical spectra for the unit scale DWT coecients of fractional dierence processes
with ; 21 d 12 . As the long memory parameter approaches 0.5 or ;0:5, the
wavelet coecients have a greater amount of correlation. However, the range of the
vertical axis in each plot is only from ;3 to 3 decibels, and the variation for any
particular spectrum is less than this range, so the spectrum for any choice of long
memory parameter does not have much structure beyond that of white noise.
The formula given in Equation (4.2) can be extended to an arbitrary scale j
using the notion of a cascading lter (cf. Section A.3). The rst step is to separate
H(jD)() into pieces using an equivalent formula to Equation (3.6) for squared gain
functions, namely,
where
X L=2 ; 1 + l 2l
L=2;1
G (D)
1 (f ) 2 cosL(f ) sin (f )
l
l=0
is the squared gain function for the Daubechies scaling coecients. Using the trigono-
metric identity, sin2(2f ) = 4 sin2(f ) cos2(f ), we can re-express the rst term of Equa-
tion (4.4) via
Y
j ;2
D(2j;1 ) = D(f ) 4 cos2(2k f ):
k=0
52
3
LA(8)
dB
frequency
0
D(4) Haar
-1
dB dB
-2
d d
frequency frequency
-3
Figure 4.1: Theoretical spectra for the unit scale DWT wavelet coecients of frac-
tional dierence processes. The z-axis ranges from ;3 to 3 dB and, for ease of viewing,
the x and y-axes have been reversed { the long memory parameter d goes from ;0:5
to 0:5 and the frequency goes from 0 to 0:5 in the direction of the arrow.
53
Since we are downsampling by 2 at each level of the transform, the spectrum for a
vector of DWT wavelet coecients Wj associated with scale j is
X H(jD) ; 21 f + 2k S ; 21 f + 2k
2j ;1
Sj(D)(f ) = j j
2j
j j
k=0
where
"Y
j ;2 # L
2
H (D)
j (f ) = D (f )
L
2 4 cos2(2k f ) C (2j;1 f )Gj(;D1)(f ): (4.5)
k=0
That is, the spectrum is stretched by 2j , and then 2j ; 1 aliased versions are added
to it (Vetterli and Kova%cevic 1995, p. 66). This can intuitively be seen through
successive applications of Equation (4.1) to the ltered spectrum.
Table 4.1: Maximum dynamic range for the spectra of DWT wavelet coecients,
in decibels (dB), when applied to fractional dierence processes with long memory
parameter d.
(Percival and Walden 1993, p. 201). Table 4.1 gives the maximum dynamic ranges,
in dB, for the spectra of DWT coecients applied to fractional dierence processes
with long memory parameter ; 21 d 12 . As the level of the DWT increases, where
more and more energy is present for red processes, the dynamic range of the spectra
is negligible and appears to level o around 3 dB regardless of wavelet lter. This
lack of dynamic range, which corresponds to almost uncorrelated observations in the
original process, is utilized in the next chapter to test for nonstationary events in the
presence of long memory structure.
Table 4.2: Maximum dynamic range for the spectra of DWT wavelet coecients, in
decibels (dB), when applied to AR(1) and MA(1) processes with parameters and
, respectively.
When = 0 for the MA(1) process, or = 0 for the AR(1) process, we see that
the spectra equal 1 for all frequencies. This is to be expected since the processes
56
4.1.3 Conclusions
The DWT has been shown, through spectral theory, to approximately decorrelate
both short-memory and long memory processes. As seen from in Figures 4.2 and 4.3,
this attribute appears to fail when the process asymptotes in the high frequency range,
say f = 0:5, instead of in the low frequency range. If this occurs in practice, there are
at least two simple ways to overcome this problem. First, a signal processing trick
of multiplying every other value of the time series by ;1 will reverse the spectrum
of the original series. A large amount of energy in high frequencies will therefore be
shifted into the lower frequencies { where the DWT has been show to approximately
57
40
LA(8)
35
dB 30
25
20
frequency
D(4) Haar
15
10
dB dB
5
frequency frequency -5
Figure 4.2: Theoretical spectra for the unit scale DWT wavelet coecients of an
AR(1) process. The z-axis ranges from ;5 to 40 dB and, for ease of viewing, the x-
and y-axes have been reversed { the parameter goes from ;0:99 to 0:99 and the
frequency goes from 0 to 0:5 in the direction of the arrow.
58
LA(8) 5
dB
-5
-10
-15
frequency
D(4) Haar
-20
-25
dB dB
-30
-35
frequency frequency
-40
Figure 4.3: Theoretical spectra for the unit scale DWT wavelet coecients of an
MA(1) process. The z-axis ranges from ;5 to 40 dB and, for ease of viewing, the x-
and y-axes have been reversed { the parameter goes from ;0:99 to 0:99 and the
frequency goes from 0 to 0:5 in the direction of the arrow.
59
Percentage points for the distribution of D under the null hypothesis can be readily
obtained through Monte Carlo simulations. When N = 2,
8
>
>
< 0 x < 21
PfD xg = > P 1 ; x B ; 12 12 x 12 x < 1 (4.9)
>
: 1 x 1
;
where B 1 1 is a beta random variable with parameters 1 and 1 . The proof of this
2 2 2 2
is straightforward. Let X1 and X2 be two independent Gaussian random variables
with zero means and common variance (under H0), then
P1 = X 2X+1 X 2 and P2 = 1:
2
1 2
The random variable P1 has a beta distribution with parameters 21 and 21 . Now, the
preliminary statistics are given by D; = P1 and D+ = 1 ; P1, therefore
PfD xg = Pfmax(P1 1 ; P1) xg
= PfP1 x 1 ; P1 xg
= Pf1 ; x P1 xg
and Equation 4.9 follows directly.
There is no known tractable closed form expression for PfD xg with arbitrary
N . Hsu (1977) commented on this fact and used two methods to obtain small sam-
ple critical values Edgeworth expansions and tting the rst three moments of his
statistic, which is equivalent to D, to a one-parameter beta distribution. Inclan and
Tiao (1994) proved that, for large N and x > 0,
(r )
P N D x
P sup W 0 x = 1 + 2 X1
( ;1) l e;2l2 x2
2 t
t l=1
where Wt0 is a Brownian bridge process, and the right-hand expression is Equa-
tion (11.39) of Billingsley (1968). Table 4.3 shows how quickly the Monte Carlo
critical values converge to the quantiles of the Brownian bridge process.
61
Table 4.3: Monte Carlo critical values for the test statistic (N=2) 12 D, using the Haar
wavelet lter, for a level test. These values are based upon 10,000 replicates.
The standard error (SE) is provided for each estimate, and was computed via SE =
f(1 ; )=(10 000f 2 )g 21 where f is the histogram estimate of the density at the
(1 ; )th quantile using a bandwidth of 0.01 (Inclan and Tiao 1994). Quantiles of a
Brownian bridge process are given at the far right for comparison.
Sample size
8 16 32 64 128 256 512 1024 1
0.10 1:109 1:135 1:157 1:182 1:193 1:197 1:206 1:209 1:224
SE 0:003 0:003 0:003 0:003 0:003 0:003 0:003 0:003
0.05 1:232 1:265 1:293 1:313 1:326 1:329 1:345 1:341 1:358
SE 0:004 0:004 0:004 0:004 0:004 0:004 0:004 0:004
0.01 1:459 1:508 1:553 1:584 1:596 1:596 1:630 1:617 1:628
SE 0:007 0:008 0:008 0:009 0:008 0:010 0:008 0:007
#2] If the null hypothesis is rejected, remove the wavelet coecient with the greatest
absolute value, reduce t to t ; 1, and return to #1].
#3] If the null hypothesis is not rejected, set the threshold equal to the absolute
value of the largest (in absolute magnitude) wavelet coecient.
Although several transformations and empirical distribution function tests were avail-
able, Ogden and Parzen (1996) used the transformation g(x) = x2 in Equation (4.10)
and the Kolmogorov{Smirnov test statistic.
It should be pointed out, g will not be known in practice and hence must be
estimated from the data. This problem was addressed in Ogden (1994, Sec. 5.5)
by recommending the median absolute deviation of the nest level of wavelet coef-
cients as in Donoho and Johnstone (1994). By formulating the test statistic as in
p
Equation (4.8), the g term is no longer involved it being replaced by 2 which is
scale independent. This is seen through the following argument. Let Y1 : : : YN be
a sequence of j th level wavelet coecients, from the DWT, obtained from a white
noise process (let N be dyadic for simplicity). Hence, the wavelet coecients are
also distributed as white noise. Let g2 Var fY12g and g E fY12g. We de ne the
statistic
p ( PbkNt Y
c 2
b Ntc
)
VN (t) N PN=1 k2 ; N 0 t 1
k=1 Yk
63
When looking at the boundary crossing probability for a Brownian bridge process,
p
the asymptotic critical values for V are 2 times the asymptotic critical values for
U. Thus, we do not need to estimate the variance of the squared wavelet coecients
when testing for homogeneity of variance. The relationship between Equations (4.8)
and (4.10) is similar to the one between the test of periodogram ordinates by Schuster
(1898) and Fisher's g-statistic (Fisher 1929), where standardizing by the sum of
squares eliminates having to know the variance of the time series.
64
#2] computes the partial DWT of order J , de ned in Section 3.2, using the Haar,
D(4) and LA(8) wavelet lters
#3] discards all coecients on each scale that make explicit use of the periodic
boundary conditions
#4] computes the test statistic D for all scales based upon the remaining wavelet
coecients and
1
#5] rejects the null hypothesis if (N=2) 2 D is greater than the Monte Carlo white
noise critical levels.
A slight modi cation may be made for the DWT-based procedure for large sample
sizes, speci cally, using asymptotic critical values instead of one obtained through
Monte Carlo experiments (cf. Table 4.3).
The MODWT-based cumulative sum of squares procedure, using Monte Carlo
critical values, is similar to the DWT-based procedure already de ned { simply sub-
stitute the MODWT for the DWT. Asymptotic critical values are not available since
the MODWT wavelet coecients are correlated. However, a slight modi cation may
be made by substituting the equivalent degrees of freedom j (cf. Section 3.4.2)
instead of the sample size Nej when testing the statistic D. If we believe that the
65
10
1
Level: 3 Level: 3 Level: 3 Level: 3
d: 0.05 d: 0.25 d: 0.40 d: 0.45
10
Rejection Rate (%)
1
Level: 2 Level: 2 Level: 2 Level: 2
d: 0.05 d: 0.25 d: 0.40 d: 0.45
10
1
Level: 1 Level: 1 Level: 1 Level: 1
d: 0.05 d: 0.25 d: 0.40 d: 0.45
10
Sample Size
Figure 4.4: Rejection rates for fractional dierence processes using white noise critical
levels, N = 128, 256, 512, 1024 and 2048. The solid line is the Haar wavelet lter,
the dotted line is the D(4) and the dashed line is the LA(8).
67
256 512 1024 256 512 1024
10
10
5
Rejection Rate (%)
10
10
Sample Size
Figure 4.5: Rejection rates for fractional dierence processes using asymptotic critical
levels, N = 128, 256, 512, 1024 and 2048. The solid line is the Haar wavelet lter,
the dotted line is the D(4) and the dashed line is the LA(8).
68
to obtain analytical expressions for the quantiles of D, a subject for future study.
One may not want to perform Monte Carlo studies in order to obtain critical
values for the test statistic D. The simulation study described above was run again
substituting the asymptotic critical values (last column of Table 4.1) for the Monte
Carlo critical values. The results are given in Figure 4.5, using a similar vertical axis
to Figure 4.4 for comparison. The percentage of times D exceeded the asymptotic
critical levels was within 10% of the theoretical quantile when there were at least
128 wavelet coecients. The Haar wavelet lter was found to be conservative for
all sample sizes, that is, the percentage of times D exceeded the asymptotic critical
levels was below the theoretical quantile. Hence, using asymptotic critical values will
give reasonable results, if Monte Carlo critical values are unavailable, when the at
least 128 wavelet coecients are present.
To investigate how well this approximation performs for large sample sizes, the
procedure from Section 4.3 was performed for fractional dierence processes of length
N = 215 = 32 768 using a partial DWT of order J = 8. Due to the computational
time involved, the procedure was only repeated 1000 times. The percentages of times
that D exceeded the white noise critical levels under these conditions were found
to be quite close to the rejection rates established from asymptotic critical values
with increased variability due to the reduced number of iterations in the Monte Carlo
study. Thus, all the simulation studies we have conducted to date indicate that the
DWT adequately decorrelates long memory processes for the purpose of using the
test statistics D.
Although the MODWT of a fractional dierence process exhibits correlation be-
tween wavelet coecients, it does retain a greater number of coecients per scale.
This may be a useful attribute for testing a wider range of alternative hypotheses,
not just a sudden change of variance. To examine the cumulative sum of squares
procedure using the MODWT, a similar investigation to the DWT was performed.
The correlation structure of MODWT wavelet coecients invalidates our ability to
69
use an asymptotic distribution (like a Brownian bridge process for the DWT) when
testing for homogeneity of variance. Although Monte Carlo techniques are relatively
easy to implement, they depend on the sample size. When repeatedly testing un-
der unknown sample sizes, e.g., testing multiple variance changes in Section 4.6, this
requires considerable computing time.
One possible approach is to compensate for the correlation structure by modifying
the test statistic D, computed via the MODWT, using the equivalent degrees of free-
dom. The equivalent degrees of freedom argument was investigated in Section 3.4.2
with respect to the wavelet variance. The distribution under this argument was not
found to dier too drastically from the true distribution of the MODWT estimator of
the wavelet variance. Here, instead of testing (N=2) 12 D we propose to use (=2) 21 D,
where is the equivalent degrees of freedom given by Equation (3.14). By doing
so, we attempt to obviate the need for determining critical levels via Monte Carlo
experiments.
To investigate this test, we followed the procedure outlined in Section 4.3, of order
J = 4, repeated 10,000 times each for d = 0:05 0:25 0:4 and 0.45. The percentages of
times that (=2) 21 D exceeded the asymptotic critical levels are recorded in Figure 4.6.
We see that the percentages are quite close to the nominal rejection rates, when the
long memory parameter is smaller, and the percentages are more and more conserva-
tive as d approaches 0.5. Between the three wavelet lters, the LA(8) appears to give
the most consistent rejection rates across all sample sizes and long memory param-
eters. The D(4) also performs reasonably well, but is quite a bit more conservative
when compared with the LA(8) wavelet lter { especially as the number of wavelet
coecients decreases. This problem has already been noted when using the DWT,
that is, when using asymptotic critical values all wavelet lters suer as the number
of wavelet coecients decreases.
The equivalent degrees of freedom adjustment is crude, however, by using it the
MODWT may be used to conduct an approximate level test for variance homo-
70
256 512 1024 256 512 1024
10
10
5
Rejection Rate (%)
10
10
Sample Size
Figure 4.6: Rejection rates for fractional dierence processes using the MODWT
and asymptotic critical values, adjusted using equivalent degrees of freedom (N =
128 256 512 1024 2048). The solid line is the Haar wavelet lter, the dotted line is
the D(4) and the dashed line is the LA(8).
71
geneity of a fractional dierence process on a scale by scale basis when there are at
least 64 wavelet coecients.
Table 4.4: Performance of cumulative sums of squares (CSS) method for fractional
dierence processes (N = 656 d = 0:40) with one change of variance at k = 100.
All tests were performed at the = 0:05 level of signi cance. The parameter "
indicates the variance ratio between the rst 100 and remaining observations, and
the parameter ' is the octave band by octave band variance ratio.
CSS
Level ' Haar D(4) LA(8)
1 1.82 89:9 92:0 92:2
2 1.55 42:5 42:7 40:1
" = 1.5
3 1.33 13:9 14:0 11:7
4 1.19 8:2 7:5 7:0
1 2.64 99:9 99:9 99:9
2 2.10 82:8 83:0 80:3
"=2
3 1.65 32:9 31:7 24:8
4 1.38 12:2 11:1 8:5
1 4.29 100:0 100:0 100:0
2 3.19 98:9 98:8 98:5
"=3
3 2.30 67:9 63:9 52:7
4 1.75 25:4 22:3 13:1
1 5.93 100:0 100:0 100:0
2 4.29 99:9 99:8 99:8
"=4
3 2.95 84:8 82:6 71:6
4 2.13 40:3 32:9 19:9
73
discussion).
4.4.3 Conclusions
Several procedures for testing homogeneity of variance, on a scale by scale basis,
have been investigated with respect to fractional dierence processes. When using
Monte Carlo critical values, the DWT-based cumulative sum of squares procedure is
shown to have an adequate empirical size. When using asymptotic critical values, this
procedure gives reasonable results when at least 128 wavelet coecients are present
for testing. The MODWT-based CSS procedure using asymptotic critical values is
slightly conservative when at least 128 wavelet coecients are used.
I have also shown that the cumulative sum of squares test statistic, based on the
DWT, can successfully detect changes of variance in fractional dierence processes.
Depending on the magnitude of the variance change, the rst two scales are primarily
aected. This is to be expected since the octave band variance ratio decreases as the
scale of the DWT increases.
the DWT. This way, the location of a variance change may be associated with a
speci c observation in the original time series. This is another example of how the
MODWT, through a lack of subsampling, is useful over and above the usual DWT.
#2a] computing the partial MODWT of order J , de ned in Section 3.3, using the
Haar, D(4) and LA(8) wavelet lters
#3a] discarding all MODWT wavelet coecients, on each scale, aected by the pe-
riodic boundary conditions
#4a] computing the statistic De for all scales based upon the remaining MODWT
wavelet coecients
#5a] recording the location of the wavelet coecient from which the statistic De ,
computed using the MODWT, attains its value and adjusting for the phase by
shifting the location L2 units to the left (see Percival and Mofjeld (1997) for
j
more details).
75
100 200 300 400 500
Level 2 ...
...
....
....
.........
....
.....
.....
.....
......
....
...
.............. . ....
......
.....
.....
.....
....
..
....
....
...
...
........
....
.....
.........
....
............. . .....
........ ......
..
...
.....
.......
.....
....
.....
................. . . .
Level 1 ....
...
......
...
.....
...
...
..
...
.............. . ..
....
...
......
.....
....
...
...
..
..........
... ...
....... ...
.......
.....
.......
...
...
...
...
....
.....
.........
Level 2 .....
........ ......
..
...
....
..
...
........
...
......
..
...
............
....
.....
...
.............. . .. . ..
.............. .....
....
...
......
.....
..
....
...
...
.....
...
...
....
....
............................ ... .. . .......... .....
.......
..
...
...
........
...
.......
....
.....
....
.....
.......
.........
............................ .
Level 1 .....
..
....
......
....
....
....
...
....
..
...
..
...
....
.....
...
......
....
............... . .....
..
.......
......
......
.....
...
.....
...
...
...
....
.................. .. ....
...
.........
.....
......
....
...
..
...........
...
...
...
........................
Level 1 ....
..... ....
...
....
....
.....
.....
.....
...
.....
...
..
...................................... .....
....... ..
......
...
...
........
...
..
...
.. ......
......
.....
......
........
.. ......
.....
.........
............... ... . .....
........ .....
...
.........
..
...
...
...
.......
......
..
...
.......
.............
........................ .
Level 2 .....
.....
...
....
....
.........
.........
....
............
........................................... . .. ..............
.. ..
....
....
....
.......
......
.....
....
..
......
.....
.................
............. ... . ...
....
....
....
................
..............
....
......
...........
................ . .
100 200 300 400 500 100 200 300 400 500
Wavelet Coefficient
Figure 4.7: Estimated locations of variance change at k = 100 for fractional dierence
processes (N = 656 d = 0:4) using the MODWT. Each boxplot contains a varying
number of estimates corresponding to the associated rejection rate.
76
As in Section 4.4.2, Step #1] was modi ed by adding a sequence of white noise to
the rst 100 observations creating variance ratios of 1.5, 2, 3 and 4. The estimated
locations of the variance changes are displayed in Figure 4.7. The estimates are
roughly centered around the 100th wavelet coecient for j = 1 2 with the spread
narrowing as the variance ratio increases. There is a very slight dierence between
wavelet lters, the broader spread being associated with the longer wavelet lters.
However, for variance ratios of " = 2 or greater all three wavelets appear to perform
equally well.
The estimates from the rst level of the MODWT have a median value closer to
the truth with much less spread at every combination of variance ratio and wavelet
lter, as compared to the second level. The slight positive bias appearing in the
rst scale, with more bias in the second, appears to be an intrinsic feature to the
cumulative sum of squares method. Inclan and Tiao (1994) showed that the average
estimated location of change is biased towards the middle of the series when using
such a procedure for sample sizes of 100, 200 and 500, and variance ratios of " = 2
and 3. This should be kept in mind when interpreting the results from such an
analysis.
4.5.3 Conclusions
I have shown that the cumulative sum of squares statistic, using the MODWT, can
accurately locate a change of variance in fractional dierence processes. When the
variance ratio is large enough (" 2), the wavelet coecients at the rst scale are
distributed very tightly around the true location. Wavelet coecients at the second
scale require larger variance ratios (" 3) to achieve the same level of accuracy.
To reduce bias, I recommend using the estimate associated with the rst level when
trying to locate a sudden change of variance in a time series.
77
#1] Determine the test statistic D, via the procedure described in Section 4.3, and
record the point k1 at which D is attained. If D exceeds its critical value for
a given level of signi cance , then proceed to the next step. If D is less than
the critical value, the algorithm terminates.
#2] Determine the test statistic D for the new time series Y1 : : : Yk1 . If D exceeds
78
its critical value, then repeat step 2 until D is less than its critical value.
#3] Determine the test statistic D for the new time series Yk1 : : : YN . Repeat
step 3 until D is less than its critical value.
Table 4.5: Empirical power of iterated Cumulative Sum of Squares (CSS) algorithm
for fractional dierence processes (N = 512 d = 0:4) with one variance change at
k = 100.
Haar D(4) LA(8)
Level 0 1 2 0 1 2 0 1 2
1 9:5 85.2 5:3 7:6 87.8 4:6 7:3 88.8 3:9
2 58:9 39.8 1:2 58:9 39.7 1:4 60:4 38.5 1:2
" = 1.5
3 87:5 12.2 0:3 88:2 11.6 0:2 90:3 9.5 0:2
4 95:1 4.9 0:0 95:2 4.8 0:0 95:9 4.1 0:0
1 0:1 93.0 6:9 0:1 93.5 6:4 0:1 94.2 5:7
2 17:7 79.6 2:8 17:9 79.5 2:6 20:6 77.1 2:3
"=2
3 69:2 30.1 0:7 71:2 28.2 0:6 77:2 22.4 0:4
4 90:8 9.2 0:0 91:2 8.8 0:0 93:5 6.5 0:0
1 0:0 92.9 7:1 0:0 93.6 6:4 0:0 93.8 6:2
2 1:3 95.4 3:3 1:2 95.5 3:3 1:9 95.0 3:2
"=3
3 33:8 64.7 1:5 38:3 60.4 1:2 49:4 49.6 1:0
4 79:4 20.6 0:0 80:9 19.1 0:0 87:2 12.3 0:0
1 0:0 92.9 7:1 0:0 92.7 7:3 0:0 93.2 6:8
2 0:1 96.3 3:6 0:1 96.3 3:6 0:2 96.5 3:3
"=4
3 16:6 81.5 1:9 19:1 79.1 1:8 29:1 69.4 1:5
4 66:0 34.0 0:0 69:1 30.9 0:0 79:4 20.6 0:0
Table 4.6: Empirical power of the iterated Cumulative Sum of Squares (CSS) algorithm for fractional dierence
processes (N = 512 d = 0:4) with two variance changes at k1 = 250 and k2 = 350. Variance ratios are given by ".
Haar D(4) LA(8)
Level 0 1 2 3 0 1 2 3 0 1 2 3
1 14:5 9 :3 71.8 4:3 11:7 10:4 73.6 4:2 11:2 11:4 74.2 3:2
2 67:4 20:8 11.7 0:1 67:3 22:6 10.0 0:1 67:7 23:8 8.5 0:1
" = 1.5
3 88:7 10:1 1.1 0:0 88:8 10:6 0.6 0 :0 90:4 9:2 0.4 0:0
4 94:6 5 :4 0.0 0:0 100:0 0 :0 0.0 0:0 100:0 0 :0 0.0 0:0
1 0:1 0 :2 91.8 7:9 0:0 0 :2 92.4 7:3 0:0 0 :3 94.0 5:8
2 26:7 17:2 55.4 0:7 26:2 22:1 51.1 0:6 27:2 26:4 45.8 0:5
"=2
3 75:8 18:5 5.6 0:0 77:3 19:0 3.7 0:0 78:8 18:3 2.9 0:0
4 91:5 8 :5 0.0 0:0 100:0 0 :0 0.0 0 :0 100:0 0:0 0.0 0:0
1 0:0 0 :0 90.4 9:6 0:0 0 :0 91.4 8:6 0:0 0 :0 92.5 7:5
2 1:6 2 :2 93.8 2:3 1:6 4 :0 92.5 1:9 2:2 5 :5 90.4 1:9
"=3
3 48:0 24:0 27.8 0:2 49:7 29:8 20.3 0:2 55:2 29:4 15.3 0:1
4 83:5 16:5 0.0 0:0 100:0 0 :0 0.0 0:0 100:0 0 :0 0.0 0:0
1 0:0 0 :0 89.5 10:5 0:0 0 :0 90.5 9:5 0:0 0 :0 91.4 8:6
2 0:1 0 :2 96.5 3:2 0:1 0 :7 96.6 2:6 0:1 0 :9 96.3 2:8
"=4
3 26:3 18:8 54.4 0:6 29:6 28:0 42.1 0:4 34:9 28:9 35.7 0:5
80 4 74:0 26:0 0.0 0:0 100:0 0 :0 0.0 0:0 100:0 0 :0 0.0 0:0
81
variance ratios. The iterated CSS method once again performs quite well for small
variance ratios " = 1:5, with a slight increase in power as the wavelet lter increases
in length. For larger variance ratios, the rst scale gives a maximum rejection rate
of 94% and then hovers around 90% for very large ". All errors in the rst scale, for
higher variance ratios, are towards overestimating the number of variance changes.
The second scale, which exhibits almost no power for smaller variance ratios, rapidly
approaches the 90{95% range for " 3 and errs primarily towards overestimating the
number of variance changes also. The 100% rejection rates for the D(4) and LA(8)
wavelet lters in the fourth scale occurs because of a reduction, due to boundary
aects, in the number of wavelet coecients below a minimum established threshold.
80
60
40
20
0
1.5:1 1.5:1 1.5:1 1.5:1
D(4) D(4) D(4) D(4)
1 2 3 4
80
60
Level
40
20
0
1.5:1 1.5:1 1.5:1 1.5:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4
80
60
40
20
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.8a: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 1:5.
83
0 100 200 300 400 500 0 100 200 300 400 500
80
60
40
20
0
2:1 2:1 2:1 2:1
D(4) D(4) D(4) D(4)
1 2 3 4
80
60
Level
40
20
0
2:1 2:1 2:1 2:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4
80
60
40
20
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.8b: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 2.
84
0 100 200 300 400 500 0 100 200 300 400 500
80
60
40
20
0
3:1 3:1 3:1 3:1
D(4) D(4) D(4) D(4)
1 2 3 4
80
60
Level
40
20
0
3:1 3:1 3:1 3:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4
80
60
40
20
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.8c: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 3.
85
0 100 200 300 400 500 0 100 200 300 400 500
80
60
40
20
0
4:1 4:1 4:1 4:1
D(4) D(4) D(4) D(4)
1 2 3 4
80
60
Level
40
20
0
4:1 4:1 4:1 4:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4
80
60
40
20
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.8d: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 4.
86
quite well when the variance ratio is relatively large (" 2), especially in the rst
two scales. The third and fourth scales are quite spread out and not recommended
for estimating variance change points in practice.
Figures 4.9a{d display the estimated location of variance change for various frac-
tional dierence processes with two variance changes. For small variance ratios
(" = 1:5) the cumulative sum of squares procedure does a decent job with locat-
ing the variance changes in the rst scale, with mixed results for the second scale.
As before, we do not expect much information to come from looking at higher scales.
Although, as the magnitude of the variance ratio increases the higher scales (j = 3 4)
do exhibit structure similar to the rst two scales. Regardless, we shall strictly use
the rst two scales for inference in the future. With respect to the rst two scales,
as the variance ratio increases to, say, " = 3 or 4, then the bimodality is readily
apparent. As was the case for a single variance change, the longer wavelet lters give
a slightly more spread out distribution for the locations of the variance changes. To
be more precise, the estimated locations appear to be skewed to the right at k1 = 250
and k2 = 350 for the D(4) and LA(8) wavelet lters, especially in the second scale.
4.6.4 Conclusions
I have presented the iterated cumulative sums of squares (CSS) algorithm for detect-
ing and locating multiple variance changes in time series with long-range dependence.
The rst scale of wavelet coecients is quite powerful for variance ratios of " = 2 or
greater, for either one or two variance change-points. The second scale is also equally
powerful, but for variance ratios of " = 3 or greater. This procedure also performs
well at locating single or multiple variance changes using the auxiliary test statistic
compute via the MODWT.
87
0 100 200 300 400 500 0 100 200 300 400 500
40
30
20
10
0
1.5:1 1.5:1 1.5:1 1.5:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total
40
30
20
10
0
1.5:1 1.5:1 1.5:1 1.5:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4
40
30
20
10
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.9a: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 1.5.
88
0 100 200 300 400 500 0 100 200 300 400 500
40
30
20
10
0
2:1 2:1 2:1 2:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total
40
30
20
10
0
2:1 2:1 2:1 2:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4
40
30
20
10
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.9b: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 2.
89
0 100 200 300 400 500 0 100 200 300 400 500
40
30
20
10
0
3:1 3:1 3:1 3:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total
40
30
20
10
0
3:1 3:1 3:1 3:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4
40
30
20
10
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.9c: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 3.
90
0 100 200 300 400 500 0 100 200 300 400 500
40
30
20
10
0
4:1 4:1 4:1 4:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total
40
30
20
10
0
4:1 4:1 4:1 4:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4
40
30
20
10
0 100 200 300 400 500 0 100 200 300 400 500
Wavelet Coefficient
Figure 4.9d: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 4.
91
Hence, the variance of Xt+1 : : : XN is equivalent to the variance of X1 : : : Xt0 . The
only change at time t0 occurs with respect to the long memory parameter.
15
d = 0.05
d = 0.25
d = 0.40
d = 0.45
10
5
dB
0
-5
-10
Figure 4.10: Spectra of fractional dierence processes and octave bands of the DWT.
Frequencies between the vertical dashed lines correspond to approximate pass-bands
of the DWT. The spectra have been normalized in order to produce time series with
the variance of a fractional dierence process with long memory parameter d = 0:05.
Figure 4.10 shows how the spectra from several dierent fractional dierence pro-
cesses compare throughout octave bands which approximately correspond to scales
of the DWT. All spectra have the same total energy, which is equal to the variance
of a fractional dierence process with long memory parameter of d = 0:05. Since the
wavelet variance is approximately the integral of the spectral density function over
an octave band (Percival and Guttorp 1994), we would not expect to detect a change
93
in the variance when the spectra of the two fractional dierence processes cross. In
fact, the variability of the wavelet coecients would be greater in the section of the
time series with smaller long memory parameter than the section with the larger long
memory parameter for the scale before they intersect, with this pattern reversed for
all scales after the intersection.
Table 4.7: Rejection rates for a change in the long memory parameter of a fractional
dierence process (N = 1024). For all cases, observations X1 : : : Xt0 have long
memory parameter d1 = 0:05, while Xt0+1 : : : XN have long memory paramter d2.
The quantity ' provides the octave band by octave band variance ratio.
3 0.93 5.1 6.5 5.7 5.7 6.1 6.2 5.7 4.5 5.5
4 0.71 5.5 5.1 5.3 10.6 11.5 12.7 9.8 10.9 7.4
d
5 0.54 5.1 6.7 4.5 13.3 15.4 14.0 14.7 14.3 10.8
6 0.41 5.6 5.3 4.9 15.5 15.5 11.2 17.4 14.1 5.4
1 3.08 100.0 100.0 100.0 100.0 100.0 100.0 99.8 100.0 99.8
2 2.16 77.0 80.2 82.7 96.2 97.0 96.6 47.1 49.3 42.3
3 1.37 13.1 15.5 11.4 18.9 21.8 19.7 7.1 7.7 7.0
2 = 0:4
4 0.85 5.4 5.6 5.3 6.5 7.4 5.7 5.4 6.7 6.2
d
5 0.53 6.5 6.2 5.0 16.9 17.0 14.6 16.0 14.8 11.2
6 0.32 6.1 6.5 5.0 19.5 15.8 9.6 24.1 20.7 6.6
1 5.76 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
2 3.83 99.6 99.8 99.9 100.0 100.0 100.0 96.2 97.1 96.2
2 = 0:45
3 2.28 58.7 61.9 57.5 82.0 81.8 82.6 20.3 21.1 19.2
4 1.32 9.1 10.7 8.5 8.9 7.9 9.4 5.4 4.1 5.7
d
5 0.76 5.4 5.2 5.7 7.2 7.8 7.5 6.0 7.6 7.4
6 0.44 5.2 5.2 6.0 13.2 10.8 8.4 13.6 13.8 6.1
95
boundary between the fourth and fth scales, and hence, both sets of rejection rates
are relatively small with an increase seen in the sixth scale.
4.7.3 Conclusions
I have briey investigated how the test for homogeneity of variance reacts to changes
in the long memory parameter of a fractional dierence process which constitute
a simple example of a generalized fractional dierence process (Wang et al. 1997).
Speci cally, the simulation procedure in Section 2.2 has been modi ed to produce time
series where the long memory parameter changes at a given time, but the variance
of the process remains constant. When applying the testing procedure on time series
simulated this way, the pattern of rejection rates across scales diers from those
found when a simple change in variance occurs. Hence, this method shows promise in
addressing a very dierent alternative hypothesis { one where the variance remains
constant but the autocovariance structure of the time series changes abruptly. The
current procedure is crude and would bene t greatly from additional research.
Chapter 5
WAVELET ANALYSIS OF COVARIANCE
In this chapter, we consider the use of wavelets in the analysis of multivariate
time series. In his thesis, Hudgins (1992) introduced the concepts of the wavelet
cross spectrum and wavelet cross correlation, both in terms of the continuous wavelet
transform. In a subsequent paper, Hudgins, Friehe, and Mayer (1993) applied these
concepts to atmospheric turbulence. They found the bivariate wavelet techniques
provided a better analysis of the data over traditional Fourier methods { especially
at low frequencies. Lindsay, Percival, and Rothrock (1996) de ned the sample wavelet
covariance for the discrete wavelet transform (DWT) and maximal overlap discrete
wavelet transform (MODWT) along with con dence intervals based on large sam-
ple results. These methods were applied to the surface temperature and albedo of
ice pack in the Beaufort Sea. Kawata and Arimoto (1996) discussed the wavelet
correlation and its ability to match features between two signals. They show the
estimated wavelet correlation is eective when compared to other measures of \local
correlation," such as the Gabor transform (or short-time Fourier transform). Li and
Nozaki (1997) also introduced the wavelet cross-correlation in terms of the contin-
uous wavelet transform and related it to the cross spectrum. They went on to use
the real portion of the wavelet cross-correlation to analyze both simulated and real
signals. Recently, Torrence and Compo (1998) discussed the cross-wavelet spectrum,
which is complex valued, and the cross-wavelet power, which is simply the magnitude
of their cross-wavelet spectrum. They also introduced con dence intervals for their
cross-wavelet power and compare the Southern Oscillation Index with the Ni~no3 sea
surface temperature.
97
Here, we introduce quantities that measure the association between two station-
ary processes based on the the DWT and MODWT. First, the wavelet covariance
for bivariate stationary time series is introduced along with the notion of a decom-
position of covariance. That is, the wavelet covariance is shown to decompose the
covariance between two stationary processes on a scale by scale basis. The wavelet
correlation is also introduced, which is analogous to the usual correlation coecient
but utilizes the wavelet covariance and wavelet variance. Asymptotic normality of the
wavelet covariance and correlation is established. Estimation procedures are provided
along with approximate con dence intervals for the estimated wavelet covariance and
wavelet correlation. Both the wavelet covariance and wavelet correlation are general-
ized into the wavelet cross-covariance and wavelet cross-correlation. The lack of shift
invariance of the DWT is shown to bias the variance of the DWT estimator of the
wavelet cross-covariance. This may arise due to misalignment between the two time
series. Finally, moment properties of two potential estimators for the variance of the
wavelet covariance are investigated. One is shown to be clearly superior to the other.
5.1 De
nition of the Wavelet Covariance
Let fXtg and fYtg be stationary processes with univariate spectra (also known as
autospectra) SX () and SY (), respectively. The wavelet covariance of fXt Yt g for
scale j = 2j;1 is de ned to be
n o n o
XY (j ) 21 Cov Wjt(X ) Wjt(Y ) = 21 E Wjt(X )Wjt(Y ) (5.1)
j j
where fWjt(X )g and fWjt(Y )g are the scale j wavelet coecients for fXtg and fYtg,
respectively.
closely follows the proof of decomposition of variance for the wavelet variance, see
Percival and Walden (1999, Sec. 8.1), the major complication here being that the
cross spectrum SXY () is a complex-valued function (cf. Section C.1). We begin by
expressing the covariance between two ltered time series in the Fourier domain.
Proposition 5.1 Suppose that fXtg and fYtg are zero-mean weakly stationary pro-
cesses with autospectra SX () and SY (), respectively. If fal j l = 0 : : : L ; 1g is a
lter of length L with transfer function dened to be
X
L;1
A( f ) ale;i2fl
l=0
then the covariance between fal
Xtg and fal
Ytg is given by
Z 1
Covfal
Xt al
Ytg = A(f )SXY (f ) df
2
; 12
where fZX ()g and fZY ()g are not only orthogonal but also cross-orthogonal pro-
cesses i.e., E #dZX (f ) dZY (f 0)] = 0 f 6= f 0. De ne fUtg and fVtg to be ltered
versions of fXt g and fYtg, respectively i.e.,
X
L;1 X
L;1
Ut al
Xt = alXt;l and Vt al
Yt = alYt;l :
l=0 l=0
From Section A.2 we know that convolution in the time domain is equivalent to
multiplication in the Fourier domain, hence, alternative representations for fUtg and
fVtg are given by the spectral representation theorem (Equation (B.1)) via
Z 1
2
Z 1
2
Ut = A(f )ei2ft dZ X (f ) and Vt = A(f )ei2ft dZY (f ):
; 12 ; 21
99
The covariance between fUtg and fVt g, using Fubini's theorem (Lehmann 1983, p. 15)
and the fact that fZX ()g and fZY ()g are cross-orthogonal processes, is therefore
(Z 1 Z 1
)
;i2f 0 t
CovfUt Vt g = E fUtVtg = E
2 2
A(f 0)e dZX (f 0) A(f )ei2ft dZY (f )
; 12 ; 12
Z Z 1 1
Proof Because fWfjt(X )g and fWfjt(Y )g are obtained by ltering the stationary pro-
cesses fXtg and fYtg, respectively, we know that fW fjt(X )g and fWfjt(Y )g are stationary
processes with autospectra de ned by
SjX (f ) = Hej (f )SX (f ) and SjY (f ) = Hej (f )SY (f ) (5.2)
where Hej () is the squared gain function for f~hjlg. Since the wavelet covariance
fjt(X )g and fW
XY (j ) is the covariance between fW fjt(Y )g, and since the integral of
100
the cross spectrum SXY () is equal to this covariance, we can use Proposition 5.1 to
obtain
Z 1
similarly,
n e (X ) e (Y )o Z e 1
; 12
where GeJ () is the squared gain function for fg~Jlg. The squared gain functions for
f~hjlg and fg~Jl g are given by a formula equivalent to Equation (3.6) for squared gain
functions i.e.,
Y j ;2 YJ ;1
Hej (f ) = He(2j;1 f ) Ge(2l f ) and GeJ (f ) = Ge(2lf ):
l=0 l=0
Since He(f ) = H(f )=2 and Ge(f ) = G (f )=2, we may use Equation (3.5) to say that
Ge(f ) + He(f ) = 1 for all f . We therefore have
Z Z h
eG(f ) + He(f )i SXY (f ) df
1 1
CovfXt Ytg =
2 2
SXY (f ) df =
; 21
n e (X ) e (Y )o
; 1
2
and the case when J = 1 holds. We now proceed to prove the main assertion by
induction. Assume the property holds for J ; 1 i.e.,
n e (X ) e (Y ) o X J ;1
CovfXt Ytg = Cov VJ ;1t VJ ;1t + XY (j ):
j =1
101
So we have
n o Z 1
Z h l=0
; 2
1
e J ;1 e J ;1 i e
G (2 f ) + H(2 f ) G (2lf ) SXY (f ) df
2
=
; 12 l=0
Z h1 i
GeJ (f ) + HeJ (f ) SXY (f ) df
2
=
n o
; 1
2
Therefore,
n e (X ) e (Y )o X
J ;1
CovfXt Ytg = Cov VJt VJt + XY (J ) + XY (j )
j =1
n o X J
= Cov VeJt(X ) VeJt(Y ) + XY (j )
j =1
2
The decomposition of covariance will now be established by allowing J ! 1.
This is intuitively plausible since the wavelet lter is capturing smaller and smaller
portions of the cross spectrum as J gets larger.
Theorem 5.1 Let fXtg and fYtg be stationary processes with autospectra SX () and
SY (), respectively, and let XY (j ) be the wavelet covariance associated with scale
j , then
X
1
XY (j ) = CovfXt Ytg
j =1
that is, the wavelet covariance decomposes the covariance between fXt g and fYtg on
a scale by scale basis.
102
n (X ) (Y )o
Lemma 5.1 For all > 0, there exists a J such that Cov VeJt VeJt < for
J>J.
Proof Because Pl gJl2 = 1 and g~Jl = gJl=2J=2 , we have Pl g~Jl2 = 1=2J . Parseval's
relation (cf. Section A.2) tells us that
Z Z 2 LX
;1
2 = 1:
1 1
GeJ (f ) df = e
J
2
; 12
2
1
GJ (f ) df = g~Jl 2J
;2 l=0
n e (X ) e (Y )o Z
= GeJ (f )SXY (f ) df
1
; 1
Z 1
2
GeJ (f ) jSXY (f )j df
2
1
Z
;2
GeJ (f )AXY (f ) df
1
2
= 1
Z
;2
C GeJ (f ) df = CJ < :
1
2
; 12 2
If AXY () cannot be bounded by any nite number C , there at least exists a constant
C such that
Z
AXY (f ) df < 2
AXY (f )C
103
using a Lebesgue integral. A rough bound on the squared gain function of the scaling
lter for Daubechies wavelets is GeJ (f ) 1, so for all J > J ,
Z
GeJ (f )SXY (f ) df
1
2
;
Z Z
1
2
AXY (f ) df + C GeJ (f ) df
A (f )C A (f )<C
XY
Z XY
+ C GeJ (f ) df
1
2
2 ; 12
2 + C2J < :
2
The de nition of the wavelet covariance can be expanded to the wavelet cross-
covariance XY (j ) by allowing one sequence of wavelet coecients to be shifted by
a speci c lag
i.e.,
1 n (X ) (Y ) o
XY (j ) 2 Cov Wjt Wjt+ :
j
Theorem 5.1 still holds since the spectral properties of fYt+ g are identical to those
of fYtg.
104
E Wjt Wjt(Y+)
2
E
1
= 2j X2 (j )Y2 (j ) 2 = 2j X (j )Y (j ):
stationary process with zero mean and spectrum de ned to be S1X (f ) = He(f )SX (f ).
Percival and Walden (1999, Sec. 8.2) extended this result to arbitrary j using Equa-
tion (4.5), with a slight modi cation, to yield
"Y
j ;2 # L
2
Hj (f ) = Dd (f ) D L
2; d (f ) 4 cos2(2k f ) C (2j;1 f )Gj(;D1)(f ) = Dd (f )Aj (f ):
k=0
We recognize that Hej (f ) is a two-stage cascade lter, with the second lter Aj ()
having a form which can be factored into a lter of nite length (Daubechies 1992,
Ch. 6). The output from the rst lter is a stationary process by design (L 2d).
Filtering a zero mean stationary process with a lter of nite length produces a zero
mean stationary process (Priestley 1981), which establishes the claim in general. We
now proceed to provide distributional results for estimators of the wavelet covariance.
where Nej = N ; Lj + 1. Note, the estimator does not include any coecients
that make explicit use of the periodic boundary conditions. We can construct a
biased estimator of the wavelet covariance by simply including the MODWT wavelet
coecients aected by the boundary into Equation (5.3) and renormalizing.
Following an argument similar to Brillinger (1979) to show the asymptotic nor-
mality of ~XY (j ), we require the following central limit theorem.
106
Lemma 5.2 Let Z1 : : : ZN be a realization from the vector-valued process fZtg,
with mean vector , whose joint cumulant sequence is absolutely summable. Let
1 X;1
N
ZN N Zt
t=0
be the vector of sample means. Then ZN is asymptotically normally distributed with
mean vector and large sample variance N ;1 SZ(0), where SZ(0) is the spectral matrix
for fZt g evaluated at the frequency f = 0.
1 and 2 (f ) < 1
2 2
SjX
2 (f ) < SjY
; 21 ; 21
then the MODWT estimator ~XY (j ) of the wavelet covariance is asymptotically nor-
mally distributed with mean XY (j ) and large sample variance given by Nej;1 Sj(XY ) (0),
fjt(X )W
where Sj(XY )(0) is the spectral density function for fW fjt(Y )g (the product of the
wavelet coecients).
Proof Since L > 2d, we have that both sets of wavelet coecients fW fjt(X )g and
fjt(Y )g have mean zero. Since fW
fW fjt(X ) W
fjt(Y )g is a bivariate Gaussian second-order
stationary process, it is strictly stationary. Square integrability of the autospectra
implies that
i.e., the autocovariance sequences and autospectra are Fourier transform pairs. Be-
cause L > 2d, the squared gain function for Daubechies wavelet lters guarantees we
have
X
1
SjX (0) = 0 = sjX :
=;1
jSjXY j
2 2
(f ) 2 df SjX (f )SjY (f ) df
; 12 ; 12
!Z 1 Z 1
! 1
2
< 1:
2 2
SjX
2 (f ) df SjY
2 (f ) df
; 12 ; 12
So the cross-covariance sequence and cross spectrum associated with scale j are
also a Fourier pair and, again, by using a Daubechies wavelet lters with L > 2d,
we have SjXY (0) = 0. Therefore, the cross-covariance sequence for fW fjt(X ) W
fjt(Y )g is
absolutely summable.
We rst note that the MODWT estimate of the wavelet covariance ~XY (j ) is
essentially a sample mean for the time series W fjt(XY ) Wfjt(X )W
fjt(Y ) (cf. Equation 5.1).
This process also has an absolutely summable cumulant sequence by Theorem 2.9.1
of Brillinger (1981, p. 38). Lemma 5.2 tells us that ~XY (j ) is asymptotically nor-
mal with mean XY (j ) and large sample variance given by Nej;1Sj(XY )(0), where
fjt(XY ) evaluated at f = 0.
Sj(XY )(0) is the spectral density for W
It is easy to see that the estimated wavelet covariance is unbiased. Let ~XY (j )
108
1 NX ;1 nf(X ) f(Y )o
= E Wjl Wjl
Nej l=L ;1
1 n (X ) (Y )o
j
Since we are exclusively interested in Gaussian processes, Sj(XY )(0) may be re-
expressed as a function of the auto and cross spectra of the wavelet coecients fWjl(X )g
and fWjl(Y )g. The variance of the estimated MODWT wavelet covariance at scale j
can be computed directly via
8 N ;1 9
< 1 X (X ) (Y )=
Varf~XY (j )g = Var : e f W
W f
Nj l=L ;1 jl jlj
X;1 NX;1
N nf(X )f(Y ) f(X ) f(Y )o
= e12 Cov W jl Wjl Wjm Wjm
Nj l=L ;1 m=L ;1
! ! n
j j
fjl(Y+) o
NeX;1
= e1 j
j
j
1 ; e Cov W fjl(X )W
fjl(Y ) Wfjl(X+) W
Nj =;(Ne ;1) Nj
! !
j
NeX;1
e1 j
j
j
1 ; e sjXY (5.4)
Nj =;(Ne ;1) j
Nj
where sjXY is the autocovariance sequence for the product of the scale j MODWT
wavelet coecients with respect to fXtg and fYtg.
Let us look at the covariance between the product of wavelet coecients more
closely. We will need the following fact. If A B C D are real-valued Gaussian random
variables with zero mean, then we can use the Isserlis theorem (Isserlis 1918) to claim
Now let fUtg and fVtg be real valued stationary processes with autospectra SU ()
and SV (), respectively, and let Zt = UtVt. Using Equation (5.5) we have
fsU g ! SU ()
fsV g ! SV ()
fCUV g ! SUV ()
fCV U g ! SV U ():
Using the linearity and convolution properties of the Fourier transform (cf. Sec-
tion A.2), the spectrum of fZtg is
Z 21 Z 1
2
SZ (f ) = S (f
1 U
0
)SV (f ; f ) df +0 0
1
SUV (f 0)SV U (f ; f 0) df 0
;2 ;2
and therefore,
Z 21 Z 1
2
SZ (0) = 1 SU (f )SV (0 f ) df +
0
; 0 0
1 SUV (f 0)SV U (0 ; f 0) df 0
;2 ;2
Z 1
2
1
2
Z
= 1 SU (f )SV (f ) df + ; 1 SUV (f )SV U ( ;f ) df
;2 2
Z 1
2
1
2 2
Z
= 1 S U (f )SV (f ) df + 1 SUV (f ) df:
;2 ;2
110
then the number of DWT wavelet coecients aected by the boundary is L0j (Percival
and Walden 1999, Sec. 4.10). So we can de ne our unbiased estimator of the wavelet
111
where Nj = N=2j and Nbj = Nj ; L0j . Again, including those DWT coecients
aected by the boundary into Equation (5.8) yields the biased DWT estimator of
wavelet covariance.
It is easy to show that ^XY (j ) is an unbiased estimator of the wavelet covariance
for scale j via
8 9
< 1 NX;1 (X ) (Y )= 1 n (X ) (Y )o
j
which is equivalent to Equation (5.1). In fact, from Theorem 5.2 we know that
^XY (j ) is unbiased with large sample variance Vj0 =Nbj , where Vj0 involves the auto-
and cross spectra of the scale j DWT wavelet coecients. The large sample variance
for the DWT wavelet covariance ^XY (j ) follows from Equation (5.6) and is de ned
to be
Z 1 Z 1
Vj (f ) 2 df:
2 2
0
SjX (f )SjY (f ) df +
0 0
SjXY
0
; 12 ; 12
It has already been stated (Section 3.3) that the DWT coecients may be obtained
from subsampling the MODWT coecients on a scale by scale basis. We used this
fact to de ne the spectrum of DWT wavelet coecients for scale j in Section 4.1. For
our purposes here, we explicitly de ne the auto and cross spectrum for the scale j
DWT wavelet coecients to be
2X;1 e ; 1 k S ; 1 f + k
SjX
0
(f )
H
j
j 2 f + 2 j X 2 j 2 j j
(5.9)
k=0 2 j
and
2X;1 e ; 1 k ;1 k
SjXY (f )
0 H
j
j 2 f + 2 SXY 2 f + 2
j j
j
(5.10)
j
k=0 2j
112
respectively, where Hej () is the squared gain function for the scale j MODWT
wavelet coecients.
Let us look at a simple relationship between two processes, linear regression with
delay see e.g., Priestley (1981, pp. 663{664). Thus, if we have two time series fXt g
and fYtg with autospectra SX () and SY (), respectively, they are related via
Yt = cXt;d + t
and using properties of the Fourier transform (cf. Section A.2) we know their spectra
are related via
SY (f ) = c2SX (f ) + 2"t:
Their cross spectrum is given by SXY (f ) = ce;i2fd
tSX (f ), with co-spectrum tak-
ing the form RXY (f ) = c cos(2fd"t)SX (f ) and quadrature spectrum QXY (f ) =
c sin(2fd"t)SX (f ). For simplicity, we assume the time series fXtg is white noise
with variance X2 . Applying these de nitions to the DWT wavelet coecients of fXt g
for unit scale, we have
e1 ; 21 f SX ; 21 f + He1 ; 12 f + 21 SX ; 12 f + 12 X2
H
S1X (f ) =
0
2 = 2
; ;
since He1 12 f + He1 12 f + 12 = 1 by Equation (3.5). Knowing this, the spectrum for
the DWT wavelet coecients of fYtg is
S10 Y (f ) = c 2 X + 2:
2 2
h ; ; i
= c 2X e;ifd He1 21 f + (;1)d He1 12 f + 12
2
8
< 1 d even
= c 2X e;ifd : ; 1
2
;
He1 2 f ; He1 12 f + 21 d odd:
113
Hence, the variance of ^XY (1) will take on two dierent values depending upon the
delay between fXt g and fYtg. The MODWT estimator of the wavelet covariance,
because of its lack of subsampling, does not suer from this problem.
Table 5.1: Variance of ^XY (j ) j = 1 : : : 6, for two white noise time series associ-
ated via linear regression with delay d. The series fXt g is a white noise process with
X = 1, c = 1 and
2 2 = 0.
Level
d 1 2 3 4 5 6
0 0:9933 0:5002 0:2452 0:1247 0:0613 0:0297
1 0:7477 0:2803 0:1732 0:1045 0:0563 0:0286
2 0:9912 0:3744 0:1402 0:0887 0:0512 0:0271
3 0:7468 0:2799 0:1412 0:0767 0:0461 0:0262
4 0:9893 0:4981 0:1828 0:0697 0:0428 0:0253
5 0:7455 0:2801 0:1422 0:0678 0:0396 0:0244
6 0:9877 0:3734 0:1399 0:0709 0:0370 0:0231
7 0:7446 0:2799 0:1717 0:0780 0:0352 0:0217
8 0:9861 0:4964 0:2434 0:0915 0:0340 0:0211
the variance of ^XY (2), and that is exactly what is displayed in Table 5.1. Although
not shown here, the pattern continues with 8 distinct values for ^XY (3), 16 distinct
values for ^XY (4) and so on.
A common method in spectral analysis to overcome bias due to \misalignment"
when computing bivariate estimators is to simply shift (translate) one series relative
to the other. One method to determine the lead/lag amount between the two series is
to compute the estimated cross-covariance sequence fs^(XY
p)
g and look for a maximum.
Where the maximum occurs indicates the number of units to shift one series. Here, we
may compute the MODWT estimated wavelet cross-covariance sequence f~XY (j )g
and look for the maximum at each scale. Shifting one set of wavelet coecients
by these amounts may overcome this misalignment problem in practice if the DWT
estimator of the wavelet covariance is desired.
fjl W fjl+
= ;1 : : : ;(Nej ; 1)
j
otherwise:
The bias is due to the denominator 1=Nej remaining constant for all lags. We are still
not using wavelet coecients which make use of the periodic boundary conditions.
Just as with the wavelet covariance, we can de ne a biased estimator of the wavelet
cross-covariance based on the DWT to be
8 1 PN ; ;1 (X ) (Y ) b
>
< 21Nb PlN=;L10 Wjl(X )Wjl(Y+)
= 0 : : : Nj ; 1
>
^XY (j ) > 2Nb l=L0 ; Wjl Wjl+
= ;1 : : : ;(Nbj ; 1)
j j
(5.12)
>
: 0
j j
otherwise:
115
This estimator is biased for the same reason as Equation (5.11). This quantity is
provided for completeness, as stated in Section 5.2.2, the inherent subsampling of the
DWT will result in the variance of ^XY (j ) being 2j -periodic unless the two series
are properly aligned.
where ~XY (j ) is given in Equation (5.11), and ~X2 (j ) and ~Y2 (j ) are given in
Equation (3.11). When
= 0 we obtain the MODWT estimator of the wavelet
correlation between fXt Ytg.
Large sample theory for the cross-correlation is more dicult to come by than for
the cross-covariance. The following result can be found in Fuller (1996, p. 342).
fjt(X ) W
Proposition 5.3 If fW fjt(Y )g is a bivariate Gaussian weakly stationary process
and if all autocovariance and cross-covariance sequences are absolutely summable,
then
e
lim N Covf~XY (j ) ~ XY (j )g
N !1 j
X
1
= f tX (j )t+ ;Y (j ) + t+ XY (j )t;Y X (j )
t=;1
; XY (j )#tX (j )t+ Y X (j ) + tY (j )t; Y X (j )]
; XY (j )#tX (j )t+Y X (j ) + tY (j )t; Y X (j )]
+ XY (j ) XY (j )# 21 2tX (j ) + 2tXY (j ) + 12 2tY (j )] g:
116
; 20XY (j )#tX (j )tY X (j ) + tY (j )tY X (j )]
+ 20XY (j )# 21 2tX (j ) + 2tXY (j ) + 21 2tY (j )] g
(5.14)
where n o
Wjl(X )Wjl(X+)jtj
2j E
1
tX (j ) X2 (j ) (5.15)
is the lag-t wavelet autocorrelation at scale j for the process fXt g.
Brillinger (1979) constructed approximate con dence intervals for the auto and
cross-correlation sequences of bivariate stationary time series. We present a brief
outline of his result for the MODWT estimated wavelet cross-correlation coecients
in the form a theorem.
fjt(X ) W
Theorem 5.3 Let L > 2d, and suppose fW fjt(Y )g is a bivariate Gaussian weakly
stationary process with square integrable autospectra, then the MODWT estimator
~XY (j ) of the wavelet correlation is asymptotically normally distributed with mean
XY (j ) and large sample variance given by Equation (5.14).
Proof Since L > 2d, we have that both sets of wavelet coecients fW fjt(X )g and
fjt(Y )g have mean zero. Let us de ne
fW
hf(X )i2 hf(Y )i2 fjl(X )W
fjl(Y )
Ajl W jt B jl Wjt and Cjl W
117
1 NX;1
Aj e Ajl = ~X2 (j )
Nj l=L ;1 j
1 NX;1
Bj e Bjl = ~Y2 (j ) and
Nj l=L ;1 j
NX;1
C j e1 Cjl = ~XY (j ):
Nj l=L ;1 j
The vector-valued process fAjt Bjt Cjtg has an absolutely summable joint cumulant
sequence by Theorem 2.9.1 of Brillinger (1981, p. 38). Hence, from Lemma 5.2 the vec-
tor of sample means fAj Bj C j g are asymptotically normally distributed with mean
vector fX2 (j ) Y2 (j ) XY (j )g, and large sample variance given by Nej;1 SjABC (0),
where SjABC () is the 3 3 spectral matrix for fAjt Bjt Cjtg (cf. Section C.1).
The MODWT estimator of the wavelet correlation ~XY (j ) is essentially a function
of these sample means g(Aj B j C j ), where g(x y z) z=pxy. Appealing to Mann
and Wald (1943), we have that ~XY (j ) is asymptotically normally distributed with
mean XY (j ) and large sample variance
; ;
Nej;1 g_ X2 (j ) Y2 (j ) XY (j ) T SjABC (0) g_ X2 (j ) Y2 (j ) XY (j )
(5.16)
Z 1
2
SjAA (0) = 2 SjX
2 (f ) df
; 12
Z 1
2
SjBB(0) = 2 SjY
2 (f ) df
; 12
Z 1
2
Z 1
2
SjCC (0) = SjX (f )SjY (f ) df + SjXY
2 (f ) df
; 12 ; 12
Z 1
2
SjAB (0) = 2 SjXY (f )SjY X (f ) df
; 12
Z 1
2
SjAC (0) = 2 SjX (f )SjY X (f ) df and
; 12
Z 1
2
SjBC (0) = 2 SjY (f )SjY X (f ) df:
; 12
;
g_ X2 (j ) Y2 (j ) XY (j ) =
" #T
; 2 pXY 2(j ) 2 ; pXY (j ) p 2 (1) 2 ( )
2X (j ) X (j )Y (j ) 2Y (j ) X2 (j )Y2 (j )
2
X j Y j
XY
2 (j )
S XY
2 (j )
S XY
2 (j )
4X6 (j )Y2 (j ) jAA (0) + 2X4 (j )Y4 (j ) jAB (0) + 4X2 (j )Y6 (j ) SjBB(0)
+ 2 ( )1 2 ( ) SjCC (0) ; 4 (XY)(2j() ) SjAC (0) ; 2 (XY)(4j() ) SjBC (0):
X j Y j X j Y j X j Y j
Utilizing Parseval's relation, each auto and cross spectrum in SjABC (0) can be ap-
proximated by a sum of squared auto or cross-covariance sequences, respectively.
119
1 NX
ej ;1 XY
2 (j ) XY
2 (j )
2 s 2 + 2CjXY CjY X
Nej =;(Ne ;1) 4X6 (j )Y2 (j ) jX 2X4 (j )Y4 (j )
j
+ 4 2(XY
2 (j )
2s 2 + 1 ;s s + C 2
X j )Y (j )
6 jY X2 (j )Y2 (j ) jX jY jXY
XY (j ) XY (j )
; 4 ( ) 2 ( ) 2sjX CjY X ; 2 ( ) 4 ( ) 2sjY CjY X :
X j Y j X j Y j
Each of the autocovariance terms are equivalent to the wavelet autocovariance for
scale j (de ned by letting both wavelet coecients come from the same process in
Equation (5.1)) and each cross-covariance term is equivalent to the wavelet cross-
covariance for scale j . Using these quantities, Equation (5.16) may nally be ex-
pressed as a function of auto and cross-correlations based on the wavelet coecients
1 NX ;1
( ) ( ) + 2 ( )
ej
; 20XY (j )#X (j )Y X (j ) + Y (j )Y X (j )]
+ 20XY (j )# 21 2X (j ) + XY (j )Y X (j ) + 12 2Y (j )]
which is (almost) equivalent to Equation (5.14), for large Nej .
2
5.4 Con
dence Intervals for the Wavelet Covariance and Correlation
5.4.1 Wavelet Covariance
We now discuss how to formulate con dence intervals for the estimators of the wavelet
covariance by making use of the large sample result in Equation (5.6). This was
previously given in Lindsay, Percival, and Rothrock (1996). We will use the peri-
odogram (Equation (B.3)) and the cross-periodogram (Equation (C.3)) to help es-
timate the quantities of interest. First, we simply use the periodogram SbjX(p) () of
120
and the corresponding biased estimator of the cross covariance sequence associated
with the scale j MODWT wavelet coecients of fXt Ytg by
X f(X ) f(Y )
CbjXY
(p) e1 W W
Nj l jl jl+
where the summation goes from l = Lj ;1 to N ;1;
for
0 and from l = Lj ;1;
to N ; 1 for
< 0. Substituting the periodogram estimates of the autospectra and
cross spectrum into Equation (5.6) gives us an estimator Vej for the large sample
variance of the MODWT estimator of the wavelet covariance.
We can use Parseval's relation to obtain an alternative representation for Vej that
uses only the autocovariance and cross-covariance sequences instead of the autospec-
tra and cross spectrum. Speci cally, the integral of the product of the autospectra is
121
jY 2
b
C (p)
jXY : (5.17)
2 =1 =;(Ne ;1) j
Under the assumption that the spectral estimates are close to the true values, an
approximate 100(1 ; 2p)% con dence interval for XY (j ) is
" se se #
~XY (j ) ; *;1 (1 ; p) Vej ~XY (j ) + *;1(1 ; p) Vej
Nj Nj
where *;1 (p) is the p 100% percentage point for the standard normal distribution.
Replacing the MODWT wavelet coecients with their DWT counterparts, and ad-
justing for the number of wavelet coecients, will lead to an analogous con dence
interval for the DWT estimator of the wavelet covariance.
the non-normality of the correlation coecient for small sample sizes, a nonlinear
transformation is sometimes required { Fisher's z-transformation (Fisher 1915 Kotz,
Johnson, and Read 1982, Volume 3). Let
+ = tanh;1()
h() 21 log 11 ;
de ne the transformation. For the estimated correlation coecient ^, based on n
p
independent samples, n ; 3(h(^) ; h()) has approximately a N (0 1) distribution.
p
The factor n ; 3 leads to a better approximation of the distribution (David 1966).
An approximate 100(1;2p)% con dence interval for XY (j ) based on the MODWT
is therefore
2 8 <
9
=
8
<
93
=
4 tanh :h# ~XY (j )] ; *q (1 ; p)
;1
tanh :h# ~XY (j )] + *q (1 ; p) 5
;1
Nbj ; 3 Nbj ; 3
where Nbj is the number of DWT wavelet coecients associated with scale j . Note
that I am using the number of wavelet coecients as if I had computed the point
estimates using the DWT. This is done to provide a \better" estimate of the sample
size with respect to the number of approximately uncorrelated observations. The as-
sumption of uncorrelated observations is only valid if we believe no systematic trends
or nonstationary features exist at that scale. If an equivalent degrees of freedom
argument were available for the wavelet covariance, this could be utilized instead of
Nbj (cf. Section 7.5). The primary bene t here is, by utilizing the variance stabiliz-
ing transformation h(), that we can avoid estimating the large sample variance for
~XY (j ) in Equation (5.14).
where
1 X;1 f(X ) f(Y )
N
E jXY e W W :
Nj l=L ;1 jl jlj
(cf. Equation 5.4), and the variance of ~XY (j ) can be approximated by Ve j =Nej . If we
are interested in the performance of one estimator versus the other in practice, then
we can look at the moment properties of these estimators. The following sections
investigate the bias and mean squared error of Ve j and Vej . For the bias, explicit
calculations are made and backed-up with simulation results. Due to the complexity
of calculating the mean squared error explicitly, only simulation results are provided.
where Sj(XY )() is the spectral density of the product of the wavelet coecients
fjl(Y ). The bias for Ve j is therefore given by
fjl(X )W
W
e e e
bias V j = E V j ; Vj = E V j ; Sj(XY )(0): (5.19)
At rst glance, it is dicult to determine the bias of Ve j simply from Equation (5.19).
A surprising result comes from Percival (1993), when we restrict ourselves to the
biased estimator of the acvs in practice i.e., before taking its expectation. Speci cally,
when the process mean is unknown, the biased estimator of the acvs fs^(jXY p)
g obeys
X
Nej ;1
p)
s^(jXY = 0:
=;(Nej ;1)
So we have that, for large sample sizes, the quantity Ve j will be approximately zero
and the empirical bias will therefore be approximately equal to ;Vj . This fact is
con rmed, through Monte Carlo simulation, below.
where SbXY (), RbXY () and QbXY () are estimates of the cross, co- and quadrature spec-
tra. That is, I can write SXY (f ) = RXY (f ) ; iQXY (f ) (cf. Section C.1). The fraction
CW =N involves the smoothing window Wm() applied to the spectral estimator. Using
Equation (B.5), we can re-express this fraction as
R f W 2 ()d
CW = = C2
(N )
;f m
(N )
N N h
where Ch is based on the type of data taper used and is the equivalent degrees
of freedom for the spectral estimator (cf. Section B.3). Assuming no tapering (i.e.,
Ch = 1) and utilizing Equations (5.20){(5.22), we can write
n o n o n o n o
E SbX (f )SbY (f ) = Cov SbX (f ) SbY (f ) + E SbX (f ) E SbY (f )
2jSXY (f )j + SX (f )SY (f ) f 6= 0 1=2
2
n o n o n o2
E Rb2XY (f ) = Var RbXY (f ) + E RbXY (f )
SX (f )SY (f ) + RXY (f ) ; QXY (f ) + R2XY (f ) f 6= 0 1=2
2 2
n b2 o n b o n b o2
E QXY (f ) = Var QXY (f ) + E QXY (f )
SX (f )SY (f ) + QXY (f ) ; RXY (f ) + Q2XY (f ) f =6 0 1=2:
2 2
Note, all the above quantities are real-valued random variables. The integrals of the
squared cross spectrum and magnitude squared cross spectrum can both be expressed
through their co- and quadrature spectra i.e.,
Z 1 Z 1 Z 1
jSXY j
2 2 2
(f ) 2 df = R2XY (f ) df + Q 2 (f ) df
XY
; 12 ; 12 ;2 1
and
Z 1 Z 1 Z 1 Z 1
; 2i RXY (f )QXY (f ) df ;
2 2 2 2
2 (f ) df
SXY = R2XY (f ) df Q2XY (f ) df:
; 12 ; 12 ; 12 ; 12
(5.23)
126
where
eXY CXY +2 C;XY and oXY CXY ;2 C;XY
are even and odd sequences, respectively, based on the cross-covariance sequence.
Therefore, Equation (5.23) reduces to
Z 1 Z 1 Z 1
;
2 2 2
SXY
2 (f ) df = R2XY (f ) df Q2XY (f ) df:
; 12 ; 21 ; 12
This gives an approximate expectation, since the frequencies f = 0 1=2 are included
in the integrals, of
ne o 1 Z 12 n o 1 Z 21 n o
E Vj = 2 1 E SbjX (f )SbjY (f ) df + 2 1 E SbjXY 2 (f ) df
Z Z
;2 ;2
1 1
2 n b b o 1 21 n b2 o
= 2 1 E SjX (f )SjY (f ) df + 2 1 E RjXY (f ) df
Z
;2 ;2
n o
; 21 1 E Qb2jXY (f ) df
1
2
Z j
;2
Z 12
1 2 S jXY (f )j2 1 1
1
2 1
2
+ SjX (f )SjY (f ) df + + 2 2 (f ) df
SjXY
;2 ;21
When the magnitude squared coherence between the two processes is unity, then
SjX (f )SjY (f ) = jSjXY (f )j2 and therefore
ne o Z Z
E Vj
1 + 12 1+1
1 1
2 2
S ( f ) S ( f ) df + S 2 (f ) df: (5.24)
jX jY jXY
; 12 2 1
;2
MODWT estimator of the wavelet covariance ~XY (j ) is equivalent to the MODWT
estimator of the wavelet variance ~X2 (j ) and Equation (5.24) reduces to the quantity
AW , where AW =Nej is the large sample variance of ~X2 (j ) in Percival (1995).
j j
which depends upon the squared gain function of the wavelet lter for scale j . While
the squared gain function for the rst scale is easy enough to compute analytically,
128
Table 5.2: Empirical bias and mean squared error (mse) of Vej j = 1 : : : 6, for
uncorrelated white noise processes (N = 512), based on M = 500 realizations.
Haar D(4)
Level Vj(Haar) ave. bd
ias dse
m Vj(D4) ave. bd
ias dse
m
1 0:3750 0:3763 0:00127 0:00317 0:4102 0:4063 0:00383 0:00336
2 0:1094 0:1086 ;0:00076 0:00318 0:1315 0:1294 ;0:00210 0:00056
3 0:0449 0:0443 ;0:00064 0:00010 0:0620 0:0613 ;0:00067 0:00024
4 0:0212 0:0210 ;0:00023 0:00005 0:0308 0:0309 0:00009 0:00015
5 0:0105 0:0105 0:00000 0:00002 0:0154 0:0144 ;0:00100 0:00005
6 0:0052 0:0049 ;0:00033 0:00001 0:0077 0:0067 ;0:00104 0:00003
for higher scales this integral was evaluated through numeric integration (Press et al.
1992, Ch. 4). From the table, we see that the estimates have, on average, negligible
bias and very small mean squared error for either wavelet lter.
Figure 5.1 displays the distributions of the estimated variances Vej j = 1 : : : 6,
between these uncorrelated white noise processes (N = 512). The estimates appear to
be distributed symmetrically about their true value at all scales. There appears to be
a slight increase in variability when using the D(4) wavelet lter over the Haar across
all scales. Also shown are the distributions of the estimated variances Ve j j = 1 : : : 6.
These estimates have skewed distributions and are negatively biased across all scales.
As previously stated, the constraint on the acvs with unknown mean appears to be
forcing the estimates towards ;Vj .
One of the simplest relationships between bivariate time series is linear regression
with delay see, e.g., Priestley (1981, pp. 663{664). If we have two time series fXt g
129
Haar Haar
V~ V~~
Level 6 .. ..
Level 5 .. ...
Level 4 .. ....
Level 3 .... .............
Level 2 .... .....
...
.........
..
Level 1 ... .......... .. . . . . .. . .
D(4) D(4)
V~ V~~
Level 6 ... ..
Level 5 .... ......
Level 4 ...... .....
..... .
Level 3 .... .....
...
...
........ .
Level 2 ..... . ......
....
... . ..
Level 1 . .... ........... . ...... . .
Figure 5.1: Estimates of Vej (left column) and Ve j (right column) j = 1 : : : 6, for
uncorrelated white noise processes (N = 512), based on M = 500 iterations.
and fYtg with autospectra SX () and SY (), respectively, that are related via
Yt = cXt;d + t
and Section A.2) we know their spectra are related via
SY (f ) = c2SX (f ) + 2"t: (5.25)
130
Their cross spectrum is given by SXY (f ) = ce;i2fd
tSX (f ), with co-spectrum tak-
ing the form RXY (f ) = c cos(2fd"t)SX (f ) and quadrature spectrum QXY (f ) =
c sin(2fd"t)SX (f ). To simplify these expressions, assume fXtg is a white noise
process with X2 = 1, let c = 1 and 2 = 0. Then Equation (5.6) reduces to
Z 1 Z1
c2 X4
2
1 He2j (f ) df + c2 4
x
2
1 He2j (f )e;4fd df
;2 ;2
and can be evaluated via numeric integration as was the case with uncorrelated
processes.
Table 5.3: Empirical bias and mean squared error (mse) of Vej j = 1 : : : 6, for white
noise processes which are related via linear regression with delay (N = 512), based
on M = 500 iterations.
Haar D(4)
Level Vj(Haar) ave. bd
ias dse
m Vj(D4) ave. bd
ias dse
m
1 0:7500 0:7430 ;0:00697 0:01523 0:8203 0:8132 ;0:00717 0:01858
2 0:2188 0:2180 ;0:00072 0:00200 0:2630 0:2612 ;0:00105 0:00369
3 0:0898 0:0890 ;0:00084 0:00060 0:1240 0:1224 ;0:00159 0:00147
4 0:0425 0:0422 ;0:00029 0:00030 0:0617 0:0611 ;0:00057 0:00085
5 0:0209 0:0195 ;0:00141 0:00014 0:0308 0:0282 ;0:00266 0:00044
6 0:0104 0:0096 ;0:00083 0:00005 0:0154 0:0129 ;0:00250 0:00017
Table 5.3 gives the empirical bias and mean squared error for a simulation study
based on two processes related via Equation (5.25). Again, we see a slightly larger
bias and mean squared error when using the D(4) wavelet lter.
The distributions of the estimated variances Vej j = 1 : : : 6, between processes
which are related via linear regression with delay are given in Figure 5.2. The es-
131
Haar Haar
V~ V~~
Level 6 ... ...
Level 5 .... ......
Level 4 .... ....
....
Level 3 .... .......... ..
Level 2 .... ..
......... .. .
Level 1 .. . . ...
...... ...... . . ... .
D(4) D(4)
V~ V~~
Level 6 ...... ....
Level 5 ...
.. . . ...... .
Level 4 ..... . ......... ..
Level 3 ...... .......
......
Level 2 ...
...
.. ...
....
......... . . .
Level 1 . .... . ...
........ .. .. .... .
Figure 5.2: Estimates of Vej (left column) and Ve j (right column) j = 1 : : : 6, minus
their true value, for processes which satisfy a linear regression with delay relationship
(N = 512), based on M = 500 iterations.
timates appear to be distributed symmetrically about their true value at all scales.
There is a slight increase in variability when using the D(4) wavelet lter over the
Haar across scales. Also shown are the distributions of the estimated variances
Ve j j = 1 : : : 6. Again, these distributions are negatively biased about their true
132
value.
5.5.4 Conclusions
We have compared two potential estimators, Vej and Ve j , for the variance of the wavelet
covariance. The former is based on using periodogram-based estimates of the integrals
in Equation (5.6), while the latter uses an estimate of the autocovariance sequence in
Equation (5.4). The variance estimate Vej , de ned in Equation (5.17), is an unbiased
estimator of Vj and has negligible mean squared error when considering uncorre-
lated or linear regression with delay processes. The alternative variance estimate
Ve j , de ned in Equation (5.18), is a negatively biased estimator of Vj with the bias
approaching ;Vj for large Nej .
Chapter 6
APPLICATIONS
In this chapter we apply the various techniques previously introduced, such as
testing homogeneity of variance in time series and analyzing bivariate time series
using wavelet estimators, to real data.
The Nile River minimum water levels (Toussoun 1925) is a time series of yearly
measurements starting in 622 AD and continuing, with both large and small gaps of
missing values, into the twentieth century. We analyze the rst continuous piece of
the series from 622 AD to 1284 AD. A key feature of the series is a marked increase
in variability during the rst century of measurements (Beran 1994, Sec. 10.3). We
compare results from our wavelet analysis to those of Beran and Terrin (1996), where
they utilized a test statistic to detect a change in the long memory parameter in the
time series. We nd a signi cant change of variance around 720 AD which coincides
with the construction of an instrument, called a nilometer, in 715 AD.
A time series of vertical ocean shear measurements (Percival and Guttorp 1994),
where the observations are based on depth not time, is analyzed to detect multiple
variance changes in the series. Two bursts of increased variability occur towards the
beginning and the end of the series. When comparing the series to 4096 observations
in the middle of the series (used in Percival and Guttorp (1994)), there is increased
variability in the rst 5 scales only. Applying the multiple variance change detection
procedure (Section 4.6) to this series yields a variety of signi cant variance changes
in the rst ve scales. The two obvious bursts of variability (at 450 m and 1000 m)
are adequately identi ed, and a third burst around 800m appears in the rst four
scales.
134
The Madden{Julian oscillation (MJO) (Madden and Julian 1971) was originally
discovered using bivariate spectral analysis i.e., the lag window estimators of co-
spectrum and magnitude squared coherence. Since then it has been identi ed and
described by researchers in a variety of physical disciplines see Madden and Julian
(1994) for a review. I reanalyze the data used by Madden and Julian (1971) using
multitaper spectral techniques and, more importantly, bivariate wavelet techniques
developed in Chapter 5. The multitaper estimates of the co-spectrum and magnitude
squared coherence show a much more narrow period for the MJO, primarily because
the amount of smoothing has been drastically reduced with respect to corresponding
lag window estimates. The estimated wavelet correlation and cross-correlation agree
with the original ndings. A peak in the estimated correlation occurs in the fth scale,
corresponding to changes of 16 days and frequencies 1=64 f 1=32 cycles per day,
between station pressure and 850 mb zonal wind, and between 150 mb and 850 mb
zonal winds with a small lead/lag relationship between the atmospheric variables.
While analyzing atmospheric time series collected at Canton Island (Madden and
Julian 1971), we provided an empirical validation of the bivariate wavelet techniques,
but did not take advantage of the time-localization properties which the wavelet trans-
form possesses. With this in mind, we turn our attention to investigating the possible
interaction between El Ni~no{Southern Oscillation (ENSO) events with the MJO. Us-
ing daily station pressure readings from Darwin, Australia, and Tahiti, French Poly-
nesia, we construct a (daily) Southern Oscillation Index from roughly 1957 to 1992
by simply dierencing the observations at these two stations { this is a measure of
ENSO activity. Similar readings from Truk Island are used as a proxy for the MJO.
A bivariate wavelet analysis is performed between these two atmospheric time series.
We nd a large peak in the fth scale of the wavelet correlation corresponding to the
MJO. The wavelet cross-correlation nicely \decomposes" the usual cross-correlation
into a few distinct patterns. The time-varying structure of the wavelet variance and
covariance is also qualitatively analyzed by partitioning them by season and ENSO
135
activity.
1400
1300
1200
1100
1000
Year
Figure 6.1: Nile River minimum water levels for 622 AD to 1284 AD. These data
can be obtained via the World Wide Web at http://lib.stat.cmu.edu/S/ under
the title `beran'. This is the address for StatLib, a statistical archive maintained by
Carnegie{Mellon University.
of this time series as a long memory process began with the doctoral works of Mohr
(1981) and Graf (1983). Both used Fourier transform (periodogram) analysis for
estimating the self-similarity parameter of a fractional Gaussian noise model. Graf
(1983) reported estimates of H = d + 12 between 0.83 and 0.85.
Beran (1994, p. 118) has reported estimates of H = 0:84 for fractional Gaussian
noise and H = 0:90 for a fractional ARIMA model with 95% con dence intervals of
(0:79 0:89) and (0:84 0:96), respectively. He also established a goodness-of- t test
for the spectral density of a long memory process. An approximate p-value for the
fractional Gaussian noise model of the yearly minimum water levels of the Nile River
is 0.70 { meaning that fractional Gaussian noise appears to t the spectral density of
the Nile River series well.
137
Data
D1
D2
D3
D4
S4
Figure 6.2: Multiresolution analysis of the Nile River minimum water levels using
the D(4) wavelet lter and MODWT. The top plot of the gure is the series itself,
while the ve time series plotted below it constitute an additive decomposition of the
series into components associated with { from top to bottom { variations on scales of
1 year (De1), 2 years (De2), 4 years (De3), 8 years (De4) and 16 years or longer (Se4). The
vertical dotted line splits the series into two parts: the rst 100 observations (from
622 to 721 AD) and the remaining 563 observations (722 to 1284 AD).
138
Each subseries Dej is associated with changes at scale j = 2j;1 , while SeJ is associated
with weighted averages over scales of 2J see Percival and Mofjeld (1997) for more
details. We used the D(4) wavelet in conjunction with the MODWT, extended to
N coecients at each scale by assuming periodic boundary conditions. Visually it
appears that there is greater variability in changes on scales of 1 and 2 years prior to
722 AD, but not on longer scales. Beran (1994, Sec. 10.3) investigated the question
of a change in the long memory parameter in this time series by partitioning the
rst 600 observations into two subseries containing, respectively, the rst 100 and
the remaining 500 measurements. Estimates of the long memory parameter d, using
maximum likelihood, were quite dierent between the two subseries, 0.04 and 0.38
respectively. This analysis suggests a change in d, a conclusion that was also drawn in
Beran and Terrin (1996) using a procedure designed to test for a change in the long
memory parameter. We can perform a similar analysis using the wavelet variance
Y2 (j ), which makes use of the DWT or MODWT to decompose the variance of
fYtg on a scale by scale basis (cf. Section 3.4). The estimated MODWT wavelet
variances, given a partitioning scheme similar to the one used by Beran, are displayed
in Figure 6.3. We see that the 95% con dence intervals for scales of 1 and 2 years do
not overlap, which agrees with the apparent change of variance for those same scales
in Figure 6.2.
For a fractional dierence process we have Y2 (j ) / 2j d;1 approximately, so we
can estimate d by regressing log ~Y2 (j ) on log j and using the estimated slope ^
139
10000
5000
-
Wavelet Variance
- -
1000
-
- -
-
-
500
-
100
622 - 721 AD
722 - 1284 AD
50
1 2 4 8 16 32 64
Scale (years)
Figure 6.3: Estimated D(4) wavelet variances for the Nile River minimum water levels
before and after the year 722 AD, along with 95% con dence intervals based upon a
chi-square approximation given in Percival (1995).
to form d^ = 12 (^ + 1) (Percival and Walden 1999, Sec. 8.1). This procedure yields
estimates of d^ = 0:38, 0.42 and ;0:07 for, respectively, the whole time series, the
last 563 observations and the rst 100 observations. These compare favorably with
Beran's values of 0.40, 0.38 and 0.04, but it is clear from Figure 6.3 that the smaller
value for d^ in the rst 100 years is due to increased variability at scales of 2 years or
140
less. The observed dierence in variability at longer scales between the rst and last
portions of the time series is consistent with sampling variability.
Table 6.1: Results of testing the Nile River minimum water levels for homogeneity
of variance (N = 663) using the Haar wavelet lter with Monte Carlo critical values.
As shown in the table, the test statistic at scale 1 is signi cant at the 1% level, and
the test statistic at scale 2 is signi cant at the 5% level.
Scale D 10% critical level 5% critical level 1% critical level
1 0:1559 0:0945 0:1051 0:1262
2 0:1754 0:1320 0:1469 0:1765
4 0:1000 0:1855 0:2068 0:2474
8 0:2313 0:2572 0:2864 0:3436
With a change of variance detected in the rst and second scales, we can apply
the methodology from Section 4.5 to locate these change points. Figure 6.4 displays
the normalized cumulative sum of squares as a function of wavelet coecient for the
rst two scales. We see a sudden accumulation of variance in the rst 100 years and
a gradual tapering o of the variance afterwards (by construction the series must
begin and end at zero). The maximum is actually attained in 720 AD for the level 1
141
coecients and 722 AD for level 2. The subsequent smaller peaks occurring in the
ninth century are associated with large observations, as seen in the original series,
not changes in the variance of the time series.
Data
1400
1200
1000
D2
0.15
0.10
0.05
0.0
D1
0.15
0.10
0.05
0.0
Years
Figure 6.4: Normalized cumulative sum of squares from the rst two scales of the
MODWT for the Nile River minimum water levels. The vertical dotted line is at
715 AD.
The source document for this series (Toussoun 1925) and subsequent historical
142
studies by Popper (1951) and Balek (1977, Ch. 1) all indicate the construction in
715 AD of a \nilometer" in a mosque on Roda Island in the Nile River near Cairo.
The yearly minimum water levels for 715 AD to 1284 AD were measured using this
device, or a reconstruction of it done in 861 AD. The precise source of measurements
for 622 AD to 714 AD is unknown, but they were most likely made at dierent
locations around Cairo, with possibly dierent types of measurement devices, of less
accuracy than the one in the Roda Island mosque. Our estimated change point at
720 or 722 AD coincides well with the construction of this new instrument in 715 AD,
and it is reasonable that this new nilometer led to a reduction in variability at the
very smallest scales.
Beran and Terrin (1996) had looked at the Nile River minimum water levels and
used a test statistic to argue for a change in the long memory parameter in the
time series. The results from our analysis, in conjunction with an examination of
the historical record, suggest an alternative interpretation. There is a decrease in
variability at scales of 2 years and less after about 720 AD and that this decrease
is due to a new measurement instrument, rather than to a change in the long term
characteristics of the Nile River.
0
1/s
-2
-4
-6
meters
Figure 6.5: Plot of vertical shear measurements (inverse seconds) versus depth (me-
ters). The two vertical lines are at 489.5 m and 899.0 m, and denote the roughly
stationary series used by Percival and Guttorp (1994). This series can be obtained
via the World Wide Web at http://lib.stat.cmu.edu/datasets/ under the title
`lmpavw'.
stationary section in between. Percival and Guttorp (1994) commented on this fact
and only looked at 4096 observations ranging from 489.5 m to 899.0 m in their paper.
Wang, Cavanaugh, and Song (1997) analyzed the full time series in order to estimate
a time-varying self-similarity parameter using the DWT. We propose to apply the
methodology for detecting and locating multiple variance changes (cf. Section 4.6)
to this geophysical series.
Figure 6.6 gives a multiresolution analysis of the ocean shear time series using
the D(4) wavelet. The eight time series plotted constitute a portion of an additive
decomposition of the series into components associated with { from top to bottom {
variations on scales of 0.1 meters (De1), 0.2 meters (De2), 0.4 meters (De3), 0.8 meters
(De4 ), 1.6 meters (De5), 3.2 meters (De6), 6.4 meters (De7 ) and 12.8 meters (De8). We
see a persistence of the increased variability in the rst 5 scales around 1000m, and
144
D1
D2
D3
D4
D5
D6
D7
D8
Figure 6.6: Multiresolution analysis of the vertical ocean shear measurements using
the D(4) wavelet lter and maximal overlap discrete wavelet transform. The rst
eight details De1{De8 are displayed with each series on the same vertical scale. The
two vertical lines are at 489.5 m and 899.0 m, and denote the wavelet coecients
used by Percival and Guttorp (1994).
145
1.000
--
-
-
-- -
0.100
--
wavelet variance
-
-
0.010
- -
-
0.001
N = 6875
-
- N = 4096
-
Figure 6.7: Estimated wavelet variance of the vertical ocean shear measurements
using the D(4) wavelet lter and MODWT. The light grey con dence intervals cor-
respond to all 6875 observations, while the dark grey con dence intervals correspond
to the middle 4096 observations as analyzed in Percival and Guttorp (1994).
The inuence of the ends of the time series (i.e., the observations outside the ver-
tical dotted lines in Figure 6.5) is most evident when comparing its wavelet variance
to the wavelet variance between the middle 4096 observations see Figure 6.7. The
146
bursts of increased variability observed in the rst 5 scales make a signi cant con-
tribution to the wavelet variance. For those scales, the con dence intervals do not
overlap between the full and truncated time series, whereas the con dence intervals
do overlap for all subsequent scales. As with the Nile River minimum water levels,
this feature hints at a possible heterogeneity of variance in the rst 5 scales.
Classifying these data as having long-range dependence is not obvious. The roll-
o of the wavelet variance at the higher scales (lower frequencies) does not t with the
general framework of a fractional dierence process. Wang et al. (1997) estimated
a time-varying long memory parameter for these measurements. The middle of the
series has a roughly constant long memory parameter between 0.65 and 0.70, while
the ends of the series exhibit much greater long memory parameters. I will not
concentrate on modeling this process as a globally or locally self-similar process, but
instead investigate the nonstationary features through testing for homogeneity of
variance on a scale by scale basis.
Figure 6.8 shows the MODWT wavelet coecients for the rst ve scales of the
vertical ocean shear measurements. The vertical dotted lines are the estimated loca-
tions of variance change points using the DWT to test and the MODWT to locate
with asymptotic critical values ( = 0:05). The procedure does a good job of isolating
the two regions of increased variability at 450 m and 1000 m in each scale, except
for the second scale. There, the rst burst has been \picked apart" by the procedure
with 10 distinct stationary regions. This does not seem appropriate and it is unclear
why this only occurred on the second scale when the third scale appears to be sim-
ilar in changing variability with time. Besides the two obvious regions of increased
variability, there appears to be a third burst around 800 m. It is present, to diering
degrees, in the rst four scales whereas most other bursts disappear after the rst
and second scale. This is a much more subtle type of nonstationarity, compared to
the obvious bursts at 450 m and 1000 m, and not particularly visible in the original
time series with the naked eye.
147
Level: 5
0.5
0.0
-0.5
-1.0
Level: 4
0.5
0.0
-0.5
0.6 -1.0
Level: 3
0.4
0.2
1/s
0.0
-0.4
Level: 2
0.4
0.2
0.0
-0.2
-0.4
Level: 1
0.3
0.2
0.1
-0.1
-0.3
Depth (meters)
Figure 6.8: Estimated locations of variance change for the vertical ocean shear mea-
surements using the D(4) wavelet lter displayed on the MODWT wavelet coecients.
Only the rst ve scales were found to have signi cant changes of variance. Asymp-
totic critical values were used for the hypothesis testing at the = 0:05 level of
signi cance.
148
This algorithm for detecting and locating multiple variance changes via the DWT
is in its infancy. More work is needed in order to re ne the procedure and investigate
its properties. Given the ability of the DWT to remove heavy amounts of autocorre-
lation in time series, this method has wide application in many elds. Whereas this
test can handle high amounts of autocorrelation, as found in stationary long memory
processes, the advantage of this procedure is that only limited assumptions are made
with respect to the underlying spectrum of the observed physical process.
Truk
Canton
Darwin
Tahiti
Figure 6.9: Climate stations in the tropical Paci c Ocean. The horizontal line is the
equator, plotted for reference. The horizontal range is roughly from 110 E to 140 W
and the vertical range is 45 .
a Nyquist frequency of f(N ) 1=(2"t) = 1=2 cycles/day. For days with no mea-
surements recorded, an ARIMA (3,1,0) model was t and one step ahead predictions
were used to ll-in the gaps (Jones 1980). The majority of missing values were iso-
lated observations, except for a week of missing data between 5 January 1965 and
10 January 1965. This gives us three length 3591 time series, which are shown in
Figure 6.10. The ragged look of each series is because no decimal places were kept
for any measurement.
We rst analyze each time series separately, starting with the periodogram and
150
Station Pressure
1015
1010
1005
1000
850 mb Wind
20
15
10
5
0
150 mb Wind
30
20
10
0
time
Figure 6.10: Atmospheric time series collected from Canton Island (2:8 S, 171:7 W)
over the period 1 June 1957 to 31 March 1967. From top to bottom, they are station
pressure (in hPa), wind speed at 850 mb and wind speed at 150 mb (both in km/h).
151
then apply a variety of techniques to investigate any potential sources of bias. Con-
clusions drawn here are compared with those found in Madden and Julian (1971).
The lag window spectral estimates utilized in the original paper are reproduced, as
best as possible, and compared with multitaper spectral estimates. After looking at
the three series independently, we perform a bivariate spectral analysis between the
three possible pairings of the time series. Lag window estimates of the co-spectrum
and magnitude squared coherence are contrasted with their corresponding multitaper
estimates. A bivariate wavelet analysis is then performed in order to see how these
new techniques compare to classical spectral analysis.
`tapering' the rst and last 10% of the resulting N members by multipli-
cation by a segment of the cosine curve so that the ends of the series are
zero, and 3) performing the fFt to obtain N=2 harmonic coecients. The
squared amplitudes or modi ed periodogram estimates are then averaged
by a running average of length L coecients this averaging producing
an estimate of the continuous spectra viewed through a rectangular spec-
tral window of bandwidth equal to (2L=N )fN where fN is the Nyquist
y y
frequency."
and reference Bingham et al. (1967) for their Fourier methodology.
In order to reproduce the results of Madden and Julian (1971), some preliminary
interpretations and calculations must be performed. From the quotation given above,
they appear to have utilized a lag window spectral estimate using the Daniell smooth-
ing window. From p. 703, \The value of L was chosen so that the bandwidth of the
spectral window was 0.0081 day;1." Using the formula from Table 269 in Percival
and Walden (1993), we can compute the window parameter
m = B 1"t = 0:00811
123:
W
For the Daniell smoothing window, the parameter m controls the amount of averaging
across frequencies of the spectral estimate { the smaller the m the more averaging
occurs. Upon comparing our lag window spectral estimators with those in the original
work, they do not appear to have the same degree of smoothness. Two possible
explanations for this dierence are they didn't use lag window spectral estimators
as de ned in Percival and Walden (1993) or the lag window spectral estimates were
smoothed again, possibly by splines, before publication. Regardless, we can obtain
a reasonably smooth spectral estimator by replacing the Daniell smoothing window
with the Parzen smoothing window. A recalculation of m
228 for the Parzen
smoothing window is required, where m is now a truncation point where the acvs is
zero for lags greater than m. The left column of plots in Figure 6.11 show these lag
153
window spectral estimates. The station pressure spectrum is very close to the one
displayed in Madden and Julian (1971).
As an alternative to lag window spectral estimation, we apply multitaper spec-
tral estimation (Thomson 1982) to these series. Several dpss data tapers, which are
orthogonal and normalized to have unit energy, are applied to the time series. The
modulus squared Fourier transforms of these tapered series (also known as eigenspec-
tra), are then averaged across frequencies. The right column of plots in Figure 6.11
show these multitaper spectral estimates. Although these spectral estimates are much
less ragged than the periodogram, they are far from the smoothness of a lag window
spectral estimate. We see an \annual" peak near zero frequency in the three multi-
taper spectra, with the peak being the largest in the 150 mb wind speed series. This
is most likely due to its relatively at background spectrum. Notice, the multitaper
spectral estimate for the 150 mb wind speed series agrees with the periodogram in
shape. This is contrary to the result using a direct spectral estimator (dpss, NW = 4)
stated at the beginning of this section.
There is an issue in how to handle the annual cycle in these data. From Madden
and Julian (1971, p. 703),
Given the degree of smoothing involved in the lag window spectral estimates, this
does not appear to be necessary. The plots in the left column of Figure 6.11 were not
154
SLP SLP
30
30
.
25
25
.
20
20
.
.
. . . . .
.. .. . ... .
dB
dB
15
15
.. . . .
. . . .. . .. . . . . . . . . . . . .
. . . .. . . .
. . .. . . . . . . .. . . . . . . .. . . . .
. . .
.
10
10
. . . . .. .... . ...
. . .. . . . . . . . . . .. . .
. .
.
. . . . . . . .. . . . . . . . .
.
.. . . . . . . . .. .
. .
. .. . . . . . .. . . . . .. .. .. ... .
.. . . . . . . . . . . . .. . . .
5
5
. .. . .
.
. . .. . . . .. . . . .. .
. . ..
. . . ...
.. . .. . .
..
.. . . . .
. . . . .. .. . .
. .. . . . . ..
. .
0
0
. . . . .
. . . . . . . .
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) frequency (cycles/day)
W150 W150
30
30
..
. . . . .. .
25
25
.. . . .
. . .. . . .. . . . . .
. . . .. . .
.
. . .. ...
. . . . . .
. ... . . ... . ... . .. .. .
. . . .
. . . . . . . . . . . . .. . .. . .. . . . . .. . . . .. .
20
20
. . .
.. .
. . . .. . .
. . .... . .. . . . . . .. . .. .
. . .
. . ... . . . . . . . .... . . . . . . .. .
. . . . .. .. . .. .. . ..
. . . .....
dB
dB
15
15
. .. . .. . . .. . . . ..
. . . .. . .. ..
.. . . . .. . . . . .. .
. .
. . .. . . . . . .. .. .
. . . . . .
10
10
. . .. . . . . . . . .
. . . .
. . . .
. . . .
.. . .
. . .
5
. .
.
.
. .
0
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) frequency (cycles/day)
W850 W850
30
30
. .
. .
25
25
.
. .. . . .
.. . . .. . . . . .
.. . .. . . . . . . . .
. . . .. .. . . . . .
20
20
. . .
. .. . .. . .. . . .. . . .
. . . .. .
. . . .. .. . . . .. .
. .
. . . .
.. ... . .. .. . .
. .. . .
. .. . . .
.. . .. . .
dB
dB
15
15
. . . .. . . .. . . ..
. . .. . .
. .. . . .. . .
. . . .. .
.. .
.
.. . . . . . .. . . . . .
.. . . . . . . . . .. .. . . . . . . . . .
. .. . . . .. . . . .. . .
10
10
. . .. . . .
.
. .. . . .. . . .
. . . . . .
. .. .. . . . . . . .. .
. . . . . . . . .. .
. . . . .
5
.. . .
. . . .
. . .
. .. . .
0
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) frequency (cycles/day)
Figure 6.11: Univariate spectral analysis of Canton Island data. The left col-
umn shows the lag window spectral estimates using the Parzen smoothing window
(m = 228) of the three atmospheric series with the dots representing the periodogram
estimates. The right column shows the corresponding multitaper spectral estimates
using K = 5 dpss data tapers (NW = 4).
155
adjusted this way and show no obvious dierence for frequencies close to the annual
frequency. The broad-band feature apparent in the lag window spectral estimates is
seen in the multitaper spectral estimates as multiple peaks in the frequency band of
interest.
While the periodogram and direct spectral estimates are useful in the univariate case
for data analysis, they are not appropriate when moving into the realm of multivari-
ate spectral analysis of time series. This is because important statistical quantities,
such as the mean squared coherence (msc), are unity over all frequencies when cal-
culated through these methods see Priestley (1981, p. 708) for an explanation of
this result. Hence, we concentrate our eorts on contrasting lag window bivariate
spectral estimators (as used in the original study) with multitaper bivariate spectral
estimators.
We rst look at the co-spectra between station pressure and 850 mb wind speed
and 150 mb wind speed and 850 mb wind speed. Figure 6.12 show estimates of the
co-spectra using a lag window spectral estimator and a multitaper spectral estimator.
The lag window co-spectra are similar in shape to those reported in the original paper,
diering only in the magnitude. The multitaper co-spectra exhibit the multiple peaks
in the frequency range of broad peaks for the lag window estimates with large peaks
around f = 0:025 being the most dominant feature in that frequency band.
The left column of Figure 6.13 shows the estimated lag window msc for pairwise
comparisons between the three atmospheric time series. We can test, at the level
of signi cance, the null hypothesis of zero msc by checking the estimated msc, on a
frequency by frequency basis, against 1 ; 2=(
;2) and rejecting if the estimated msc
exceeds it (Koopmans 1974, p. 284). The parameter is the number of equivalent
degrees of freedom associated with the spectral estimates, which is identical to the
156
0
estimated co-spectrum
-100
-200
SLP / 850 mb
150 mb / 850 mb
-300
frequency
Figure 6.12: Estimated co-spectra for the Canton Island data. The left panel displays
the lag window co-spectral estimates using a Parzen smoothing window (m = 228)
and the right panel is the multitaper co-spectral estimates using K = 5 dpss data
tapers (NW = 4).
0.8
0.6
0.6
estimated msc
estimated msc
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) ffuency (cycles/day)
0.8
0.6
0.6
estimated msc
estimated msc
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) ffuency (cycles/day)
0.8
0.6
0.6
estimated msc
estimated msc
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) ffuency (cycles/day)
Figure 6.13: Mean squared coherence of the Canton Island data. The two horizontal
lines in each plot are the = 0:05 (dotted) and = 0:01 (dashed) levels of signi cance
test for non-zero msc. The left column contains the lag window spectral estimates
and the right column are the corresponding multitaper spectral estimates.
158
comparisons between the three atmospheric time series. The rst ve dpss data
tapers (NW = 4) were used to compute these estimates. We can test for non-zero
msc as before. Using K = 5 data tapers gives us = 2K = 10 degrees of freedom.
We reject the null hypothesis of non-zero msc in all three plots around f
0:0036,
which corresponds to a period of around 276 days. A second peak is found around
f
0:0058, which corresponds to a period of around 171 days, between station
pressure and 850 mb wind speed. The group of frequencies which peak near the
frequencies of the Madden{Julian oscillation cover a period of around 37{40 days,
slightly shorter and much more narrow a range than the 41{53 days observed in
Madden and Julian (1971).
Several analyses of the estimated multitaper msc was performed between station
pressure and 150 mb wind speed in order to determine if signi cant bias is introduced
by using too many data tapers. Hypothesis testing using 2 data tapers is dicult
given only = 4 degrees of freedom. For 3 data tapers a small peak occurs around
frequency f = 0:0214 (approximately a 47 day oscillation), but the msc appears to
be contaminated with several spikes from the still high variability of the multitaper
estimate. With 4 or more data tapers, nothing except the annual frequency appears to
be signi cant. Hence, while some leakage may be present in this series, the hypothesis
of non-zero mean squared coherence cannot be tested without a sucient number of
data tapers.
D1
D2
D3
D4
D5
D6
D7
D8
S8
D1
D2
D3
D4
D5
D6
D7
D8
S8
Figure 6.14b: Multiresolution analysis of 150 mb wind speed series collected at Canton
Island (2:8 S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details
De1 De2 De3 : : : De8 are associated with variations on scales of 1 2 4 : : : 256 days and
the wavelet smooth Se8 is associated with variations of 512 days or longer.
161
D1
D2
D3
D4
D5
D6
D7
D8
S8
Figure 6.14c: Multiresolution analysis of 850 mb wind speed series collected at Canton
Island (2:8 S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details
De1 De2 De3 : : : De8 are associated with variations on scales of 1 2 4 : : : 256 days and
the wavelet smooth Se8 is associated with variations of 512 days or longer.
162
-
5.00
- - -
- -
Wavelet Variance
- -
0.50
- - -
- -
-
- -
sea level pressure
0.05
1 2 4 8 16 32 64 128
Scale (days)
5.00
- - -
Wavelet Variance
- -
- - -
0.50
- - -
- -
-
- -
sea level pressure
0.05
1 2 4 8 16 32 64 128
Scale (days)
Figure 6.15: MODWT estimated wavelet variance for Canton Island time series using
a D(4) wavelet lter. The station pressure series is plotted in both the upper and
lower plots for reference. The shaded regions form an approximate 95% con dence
interval.
showed no strong patterns with a range of ;0:19 to 0:20 for scale 5. When comparing
sea level pressure and 850 mb wind speed, they are most positively correlated at a
lag of +2 days (~2850=SLP (5) = 0:412), and when comparing 150 mb and 850 mb
wind speed, they are most negatively correlated at a lag of +1 days (~1150=850(5) =
;0:309). These results, which correspond to the 850 mb wind speed trailing the
164
1.0
Wavelet Correlation
SLP / 150 mb
-0.5 0.0 0.5
- - -
- -
-
-
-
-1.0
1 2 4 8 16 32 64 128
Scale (days)
1.0
Wavelet Correlation
SLP / 850 mb
0.5
-
- -
-
-0.5 0.0
- -
-
-
-1.0
1 2 4 8 16 32 64 128
Scale (days)
1.0
Wavelet Correlation
850 mb / 150 mb
-0.5 0.0 0.5
- -
- -
- -
-
-
-1.0
1 2 4 8 16 32 64 128
Scale (days)
Figure 6.16: MODWT estimated wavelet correlation for Canton Island time series
using a D(4) wavelet lter. The plots are { from top to bottom { station pressure
versus 150 mb wind, station pressure versus 850 mb wind, and 150 mb wind versus
850 mb wind. The shaded regions form approximate 95% con dence intervals.
165
station pressure by 2 days and the 850 mb wind speed leading the 150 mb wind speed
by 1 day, agrees with ndings in Madden and Julian (1971) where the 850 mb wind
was nearly in phase (
10 ) with station pressure and the two winds were found to
be almost out of phase (177 ), respectively.
6.3.5 Conclusions
It is dicult to be overly critical of the time series analysis techniques of Madden
and Julian (1971). Given the year of the discovery, they used the most reasonable
techniques available, namely, lag window bivariate spectral estimates. The amount
of tapering applied to the series (10% cosine) appears adequate when compared to
stronger data tapers. Their results led to the discovery of a broad-band feature in
atmospheric readings from the tropical Paci c Ocean.
Utilizing the multitaper techniques of Thomson (1982), we obtain \smoothed
versions" of univariate and bivariate spectral estimators. Given that the spectral
bandwidth of a multitaper spectral estimator is smaller than a corresponding lag
window spectral estimator (in general), we will not over-smooth the spectra and
potentially lose interesting features. In the frequency range of interest, we observe
several peaks instead of a one broad peak in the univariate spectra. This translates
into a very choppy estimated co-spectrum and magnitude squared coherence. When
testing the magnitude squared coherence, we observe a period of 37{40 days instead
of the 41{53 day oscillation reported in Madden and Julian (1971).
We have shown how wavelet analysis techniques have captured, and adequately
summarized, information about the Madden{Julian oscillation. The ability of the
DWT to approximately bandpass lter a time series alleviates some of the pre-
processing performed in spectral analysis of atmospheric time series, such as removal
of annual, semi-annual, and seasonal trends. These will naturally be partitioned by
the DWT. Wavelet techniques also open up the possibility of answering questions
about how the time series vary with time.
166
and in the Paci c Ocean showed a much stronger 40{50 day oscillation during the
Australian monsoon season from December to March, than the rest of the year. The
40{50 day cloud amount oscillation did not appear to be aected by warm ENSO
events. Madden and Julian (1994) note the broadband nature of the oscillation by
comparing the station pressure spectra for Truk Island (7.4 N, 151.8 W) during two
time spans { 1967 to 1979 and 1980 to 1985. The MJO appears to have a 26-day
period in the early 1980s.
The relationship between ENSO events and the MJO is a topic which could bene t
markedly by using wavelet techniques. To investigate how these two atmospheric
phenomena interact, we will analyze two time series. The rst one being the Southern
Oscillation Index (SOI), which is an indicator of ENSO and usually de ned to be
the dierence between monthly averages of the station pressure series from climate
stations at Darwin, Australia (130.8 E, 12.4 S) and Tahiti, French Polynesia (149 W,
14 S) see Figure 6.9 for the locations of these climate stations. It was rst introduced
by Walker (1928) and came from the observation that pressure in the tropical Paci c
Ocean is inversely related to pressure in the Indian Ocean.
In our case, we deviate from the usual de nition of the SOI by introducing a daily
version of it. I obtained daily pressure readings from Darwin, Australia, starting
in 1 June 1957 and continuing to 31 December 1992 (N = 12 998) and dierenced
them see Figure 6.17. The distance of the stations from the equator is apparent in the
strong annual component in the time series. The measurements in the summer and
winter of 1983 appear to be higher than those in adjacent years. This approximately
corresponds to a large ENSO event in the early 1980s. Any missing values were
lled in using one-step-ahead predictions from an ARIMA(3,1,0) model applied to
the series (Jones 1980).
I also obtained daily station pressure readings from Truk Island (7.4 N, 151.8 W)
as an indicator of the MJO. This series also exists from 1 June 1957 to 31 Decem-
ber 1992 see Figure 6.17. Unlike the SOI, there is no apparent annual trend since
168
Truk
1015
1010
1005
Pressure (mb)
1000
SOI
5
0
-10
-20
Time
Figure 6.17: Station pressure series for Truk Island (7.4 N, 151.8 W) and the South-
ern Oscillation Index. The \staggered" look of the Truk Island series prior to 1971 is
the result of rounding to the nearest millibar.
the station is quite close to the equator. Missing values were dealt with in the same
manner as described for the SOI.
We now propose to analyze the SOI and Truk Island station pressure series using
standard time-domain (e.g., the cross-correlation sequences) and Fourier (e.g., the
cross-spectrum) techniques. The cross-correlation sequence (ccs) is typically esti-
169
mated by
CbXY
(p)
p)
^(XY h i 1
s^(0pX) s^(0pY) 2
(see, e.g., Brockwell and Davis (1991, p. 29)), utilizing the periodogram-based es-
timates of the acvs for fXtg and fYtg, and ccvs. The estimated cross-correlation
sequence for the SOI and Truk Island series is shown in Figure 6.18. The maximum
occurs at a lag of +1 days. We also observe the characteristic broad-band peak com-
monly found in atmospheric time series from this region, with a approximate range
of 35{55 day lags.
0.25
0.20
0.15
0.10
ccs
0.05
0.0
-0.05
Lag (days)
Figure 6.18: Estimated cross-correlation sequence for the Southern Oscillation Index
and Truk Island station pressure series.
A spectral analysis of these data provides very little insight into the possible rela-
tionship between ENSO events and the MJO. The multitaper co-spectrum between
the SOI and Truk Island station pressure series exhibit large peaks at annual and
inter-annual frequencies, and only a very slight peak in the frequency range of the
MJO. With the co-spectrum producing values so close to zero, the multitaper msc is
170
very erratic and gives a large number of signi cant peaks over 0 f 0:08. Hence,
classical bivariate spectral estimation of these series does not exhibit any indication
of a possible relationship between these two series.
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
S10
Figure 6.19: Multiresolution analysis of station pressure series collected at Truk Island
(7.4 N, 151.8 W), from June 1957 through December 1992, using the D(4) wavelet
lter and the MODWT. The wavelet details De1 De2 De3 : : : De10 are associated with
variations on scales of 1 2 4 : : : 1024 days and the wavelet smooth Se10 is associated
with variations of 2048 days or longer.
172
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
S10
Figure 6.20: Multiresolution analysis for the daily Southern Oscillation Index, from
June 1957 through December 1992, using the D(4) wavelet lter and the MODWT.
The wavelet details De1 De2 De3 : : : De10 are associated with variations on scales of
1 2 4 : : : 1024 days and the wavelet smooth Se10 is associated with variations of
2048 days or longer.
173
-
-
1.00
-
-
- - -
-
Wavelet Variance
0.50
- -
- -
-
-
-
- -
0.10
- -
0.05
Figure 6.21: MODWT estimated wavelet variance for the Southern Oscillation Index
and Truk Island station pressure series.
quite large. The Truk Island estimated wavelet variance appears to follow the SOI
estimates in shape, with much less emphasis at the semi-annual and annual scales.
Figure 6.22 shows the estimated wavelet correlation between the SOI and Truk
station pressure series at a lag of zero days. The wavelet correlation appears to be
signi cantly dierent from zero for all scales except 6 and 7, giving moderately pos-
174
1.0
-
0.5
- -
Wavelet Correlation
-
- -
- -
- -
0.0
-0.5
-1.0
Figure 6.22: MODWT estimated wavelet correlation for the Southern Oscillation
Index and Truk Island station pressure series. The transformed con dence intervals
were computed using Section 5.4.2.
itive correlations. These results are liberal for scales 7 and 8 because the con dence
intervals assume approximate zero correlation between the product of DWT wavelet
coecients. This is not true for scales 7 and 8, since they involve the semi-annual
and annual oscillations, and residual autocorrelations in the time-varying wavelet co-
175
variance persist. The signi cant correlation at 5 lends credibility to the hypothesis
of an association between the SOI and MJO.
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
Figure 6.23: MODWT estimated wavelet cross-correlation for the Southern Oscilla-
tion Index and Truk Island station pressure series for lags up to 240 days. The
con dence intervals from Figure 6.22 apply on a point-to-point basis. The positive
peak in the fth scale 5 is at a lag of 0 days.
If we are to investigate a possible lead/lag relationship between the two series, then
176
the wavelet cross-correlation must be estimated for various lags. Figure 6.23 shows the
estimated wavelet cross-correlation between the SOI and Truk Island station pressure
series. The large positive peak in the rst ve scales is at a lag of 1 day for scales
1{2 , a lag of 2 days for scales 3, a lag of 4 days for scale 4 and zero days for scale
5. In the fth scale, the largest negative value is at a lag of 20 days. The higher
scales do not show any apparent trend when looking at lags up to 240 days. Possible
interpretations for the rst four scales is that of weather patterns as they travel from
West to East (cf. Figure 6.9). The abrupt change in lead/lag relationship of the
wavelet cross-correlation at the fth scale is most likely due to the MJO. Patterns
in higher scales (lower frequencies) correspond to semi-annual, annual, and inter-
annual trends. The ability of the wavelet cross-covariance to analyze (decompose)
the usual covariance between these two series on a scale by scale basis allows these
interpretations to be made.
Although direct comparison between Figure 6.23 and Figure 6.18 is not appropri-
ate, because the wavelet correlation does not decompose the correlation between two
stationary processes, the wavelet covariance does decompose the covariance between
two time series. Since the wavelet correlation is simply the wavelet covariance stan-
dardized at each scale, the shape of each wavelet cross-correlation is the same even
though the magnitudes are o. Hence, we may make a rough comparison between
the two, keeping in mind the facts just stated. The rst obvious dierence is the
p) is positive for all negative lags. Looking at Figure 6.23, we see that
fact that ^(XY
the larger scales (9 and 10) are all positive and contribute to this feature, whereas
for positive lags they are close to zero and allow the annual scale (8) to dominate.
The two dips on either side of the peak at lag +1 is the superposition of the rst six
scales in Figure 6.23. The subsequent peak around a lag of +40 days is a result of
the negative correlations for scales 5 and 6 pushing down the annual correlation
(8). It is not, most likely, an interesting feature in the association between these
two processes. The interaction of the correlation structure on a scale by scale basis
177
15
10
0
Truk Station Pressure
"Summer"
20
15
Time-Varying Wavelet Variance
10
SOI
"Winter"
20
15
10
0
SOI
"Summer"
20
15
10
Time
Figure 6.24: Time-varying wavelet variance for the Truk Island station pressure series
and SOI at the fth scale (5), using the MODWT and D(4) wavelet lter. The
\winter" period corresponds to November through April and the \summer" period
corresponds with May through October.
179
for the Truk Island station pressure series, 1959, 1974, 1978 and 1990. These extreme
years for the Truk Island series, before 1990, agree with the results from Anderson,
Stevens, and Julian (1984).
It is evident, from the analysis presented here, that a seasonal pattern exists in the
Madden{Julian oscillation { even in locations close to the equator. This is a feature
not easily recognized using classical spectral techniques. This increased knowledge
of MJO variability changing with time is exciting, and will hopefully allow research
scientists to better describe similar physical phenomena.
-2
-4
Figure 6.25: Indicator of ENSO activity, constructed by combining the last two
wavelet details and wavelet smooth from the multiresolution analysis of the South-
ern Oscillation Index (cf. Figure 6.20) i.e., De9 + De10 + Se10. The sample mean was
removed and the series was inverted in order to agree with the conventional SOI.
lower plot is the time-varying wavelet variance during El Ni~no periods, when ENSO
activity is negative. There is no apparent dierence between the two time series, the
median value for La Ni~na periods is 0.19 while the median value for El Ni~no periods
is 0.16.
The time-varying wavelet covariance for scale 5, de ned to be the product of scale
5 wavelet coecients computed from the SOI and station pressure series collected
at Truk Island, is given in the bottom half of Figure 6.26. Again, the upper plot is
the time-varying wavelet covariance during La Ni~na periods and the lower plot is the
time-varying wavelet variance during El Ni~no periods. Here, the wavelet covariance
during El Ni~no periods has much higher extreme values in the early 1990s, but it is
still dicult to distinguish between La Ni~na and El Ni~no periods. The median value
for La Ni~na periods is 0.036 while the median value for El Ni~no periods is 0.006.
Figure 6.27 provides similar information to that of Figure 6.26 for scale 4 , which
181
-2
4
Time-Varying Wavleet Quantities
-2
-2
-2
Time
Figure 6.26: Time-varying wavelet quantities, for the scale associated with the
Madden{Julian oscillation (5), partitioned into El Ni~no and La Ni~na periods. The
upper two plots display the wavelet variance and the lower two display the wavelet
covariance.
182
-2
-4
Variance for ENSO < -0.5
10
6
Time-Varying Wavleet Quantities
-2
-4
-2
-4
Covariance for ENSO < -0.5
10
-2
-4
Time
Figure 6.27: Time-varying wavelet quantities, for the scale associated with shorter
periods than the Madden{Julian oscillation (4), partitioned into El Ni~no and La
Ni~na periods. The upper two plots display the wavelet variance and the lower two
display the wavelet covariance.
183
is associated with shorter periods than the MJO. The time-varying wavelet variance
for the Truk Island station pressure series (top two plots) appears to have a greater
number of large coecients in the La Ni~na periods. This could indicate a potential
frequency shift in the MJO similar to the one discussed in Gray (1988), who looked at
station pressure and sea surface temperature anomalies. The time-varying wavelet co-
variance between the SOI and station pressure series collected at Truk Island (bottom
two plots) are quite similar, as was the case with scale 5.
Chapter 7
CONCLUSIONS AND FUTURE DIRECTIONS
This chapter contains ideas for future work that would complement the material
I have presented in the two major areas of my dissertation. Final comments are also
provided.
Information Criterion (SIC), also known as the Bayesian Information Criterion (BIC),
is de ned to be ;2 log L(^)+ p log N , where L(^) is the maximum likelihood function
for the model, p is the number of free parameters in the model, and N is the sample
size. Two models are being compared, the null hypothesis H0 of Equation (4.6) and
an alternative hypothesis H1 with two distinct variances i.e.,
H1 : 2
1 = = k 6= k+1
2 2 = = 2
N:
We reject H0 if SIC(N ) > SIC(k) for some k and estimate its position of the change
point k0 by k^ such that
SIC(k^) = 1min
kN
SIC(k)
where SIC(N ) is the SIC under H0 and SIC(k) is the SIC under H1 for k = 1 : : : N ;
1. Here
SIC(N ) N log 2 + N log ^ 2 + N + log N
and
SIC(k) N log 2 + k log ^2 k + (N ; k) log ^>k
2 + N + 2 log N
where
1 XN
1 Xk
1 XN
^2 N Xi ^k k Xi and ^>k N ; k
2 2 2 2 Xi2:
i=1 i=1 i=k+1
Since we require at least one sample in each estimate, we can only detect changes
for 2 k0 N ; 1. To eliminate, or at least suppress, the possibility of random
uctuations in the data contributing to the dierence between the SIC's, Chen and
Gupta (1997) introduced a signi cance level and its associated critical value c.
Hence, H0 is rejected if SIC(N ) > SIC(k)+ c for some 2 k0 N ; 2. Approximate
values of c can be obtained using the formula
1 ; b(log N )
c = ; a(log N ) log log 1 ; + exp ;2e b(log N ) ; 12
+ a(log N ) ; log N
186
7.3 Re
nement of the Multiple Variance Change Testing Procedure
At the present time, the procedure for detecting multiple variance changes in time
series is rather crude. The DWT is used solely for testing and the MODWT for locat-
ing the variance change points. The information from one set of wavelet coecients
is not used to inuence the other procedure. In this sense, I am making a \leap of
faith" in that the DWT and MODWT wavelet coecients will identically represent
features in the original time series. This is most likely not true, especially at higher
scales. If several peaks occur in the rotated cumulative variance, then it may well be
the case that the maximum value at scale j from the DWT and MODWT will not
correspond with the same location.
To ensure correspondence between the DWT and MODWT maxima, I could in-
clude a logical statement which determines if the two locations are roughly equivalent.
If so, the location is kept as a signi cant variance change, otherwise the next highest
value from the MODWT could be selected and compared with the DWT maxima.
This is repeated until an agreement is reached between the two transforms.
A more appealing x to this problem is to detect and locate variance change
points using the MODWT. This is not currently possible, since we lack an asymptotic
distribution for the test statistic when using MODWT wavelet coecients. Monte
Carlo studies are possible when testing for a single variance change (and therefore
a xed sample size), but the multiple testing procedure guarantees not knowing the
187
sample size after the rst split { making Monte Carlo results dicult to utilize.
The use of an equivalent degrees of freedom argument has been shown to reasonably
approximate the distribution of the wavelet variance and also proven sucient, for
certain sample sizes, to modify the test statistic D computed with MODWT wavelet
coecients.
Level: 5 Level: 6
0.7
0.4
0.6
0.5
0.3
0.4
0.2
0.3
0.2
0.1
0.1
0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8
Level: 3 Level: 4
0.20
Covariance Critical Values
0.25
0.15
0.15
0.10
0.05
0.05
Level: 1 Level: 2
0.12
0.16
0.10
0.12
0.08
0.06
0.08
0.04
0.04
0.02
Previous work in the distribution of the product of two normally distributed vari-
ables may be applicable here see, e.g., Craig (1936) and Aroian (1947). Torrence
and Compo (1998) utilize the distribution of the square root of the product of two
2
random variables when computing con dence intervals for their cross-wavelet power.
rived the asymptotic marginal and joint distributions of bivariate spectral estimators.
These results may be adapted for use with our bivariate wavelet estimators.
covariance and cross-correlation. With respect to the former area, I have tried to
provide a more thorough investigation of ideas from the study of long memory pro-
cesses, change-point detection and the ability of the DWT to approximately decor-
relate time series on a scale by scale basis. The concepts of wavelet covariance and
correlation are natural extensions of the work by D. B. Percival and others on the
wavelet variance.
The DWT is a powerful mathematical tool, enabling statisticians to examine
much more complicated processes by separating features on a scale by scale basis.
However, it cannot be applied to problems without caution. Which wavelet lter to
use is a very important issue. To help select an appropriate wavelet lter, several
lters of various lengths should be applied to the data and visually analyzed to help
detect the potential leakage of low frequency features throughout the multiresolution
analysis. Most importantly, serious thought should be put into how the shape of
the underlying wavelet lter matches the physical process from where the data was
sampled. Keeping these issues in mind, there should be a great variety of problems
where wavelet analysis is bene cial.
BIBLIOGRAPHY
Abraham, B. and W. W. S. Wei (1984). Inferences about the parameters of a time
series model with changing variance. Metrika 31, 183{194.
Abry, P. and D. Veitch (1998). Wavelet analysis of long-range-dependent trac.
IEEE Transactions on Information Theory 44 (1), 2{15.
Allan, D. W. (1966). Statistics of atomic frequency standards. Proceedings of the
IEEE 31, 221{230.
Anderson, J. R., D. E. Stevens, and P. R. Julian (1984). Temporal variations of the
tropical 40{50 day oscillation. Monthly Weather Review 112 (12), 2431{2438.
Anderson, T. W. (1971). The Statistical Analysis of Time Series. New York: John
Wiley and Sons, Inc.
Aroian, L. A. (1947). The probability function of the product of two normally
distributed variables. The Annals of Mathematical Statistics 18, 265{271.
Balek, J. (1977). Hydrology and Water Resources in Tropical Africa, Volume 8 of
Developments in Water Science. New York: Elsevier Scienti c Pub. Co.
Beran, J. (1994). Statistics for Long-Memory Processes, Volume 61 of Monographs
on Statistics and Applied Probability. New York: Chapman & Hall.
Beran, J. and N. Terrin (1996). Testing for a change of the long-memory parameter.
Biometrika 83 (3), 627{638.
Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley
& Sons.
193
655{668.
Mehrabi, A. R., H. Rassamdana, and M. Sahimi (1997). Characterization of
long-range correlations in complex distributions and pro les. Physical Review
E 56 (1), 712{722.
Mohr, D. L. (1981). Modeling Data as a Fractional Gaussian Noise. Ph. D. thesis,
Princeton University.
Nuri, W. A. and L. J. Herbst (1969). Fourier methods in the study of variance
uctuations in time series analysis. Technometrics 11 (1), 103{113.
Ogden, R. T. (1994). Wavelet Thresholding in Nonparametric Regression with
Change-Point Applications. Ph. D. thesis, Texas A&M University.
Ogden, R. T. (1997). Essential Wavelets for Statistical Applications and Data Anal-
ysis. Boston: Birkhauser.
Ogden, R. T. and E. Parzen (1996). Change-point approach to data analytic
wavelet thresholding. Statistics and Computing 6 (2), 93{99.
Percival, D. B. (1983). The Statistics of Long Memory Processes. Ph. D. thesis,
Department of Statistics, University of Washington.
Percival, D. B. (1992). Simulating Gaussian random processes with a speci ed
spectra. Computing Science and Statistics 24, 534{538.
Percival, D. B. (1993). Three curious properties of the sample variance and auto-
covariance for stationary processes with unknown mean. The American Statis-
tician 47 (4), 274{276.
Percival, D. B. (1994). Spectral analysis of univariate and bivariate time series.
In J. L. Stanford and S. B. Vardeman (Eds.), Statistical Methods for Physical
Science, Volume 28 of Methods of Experimental Physics, pp. 313{348. Boston:
Academic Press, Inc.
200
Riedel, K. S. and A. Sidorenko (1995). Minimum bias multiple taper spectral esti-
mation. IEEE Transactions on Signal Processing 43 (1), 188{195.
Schuster, A. (1898). On the investigation of hidden periodicities with application to
a supposed 26-day period of meterological phenomena. Terrestrial Magnetism 3,
13{41.
Serroukh, A., A. T. Walden, and D. B. Percival (1998). Statistical properties of
the wavelet variance estimator for non-Gaussian/non-linear time series. Tech-
nical Report 98{03, Department of Mathematics, Imperial College of Science,
Technology & Medicine.
Slepian, D. (1978). Prolate spheroidal wave functions, Fourier analysis, and unc-
etainty { V: The discrete case. Bell System Technical Journal 57, 1371{1430.
Srivastava, M. S. (1993). Comparison of CUMSUM and EWMA procedures for
detecting a shift in the mean or an increase in the variance. Journal of Applied
Statistical Science 1 (4), 445{468.
Stephens, M. A. (1970). Use of the Kolmogorov{Smirnov, Cramer{von Mises and
related statistics without extensive tables. Journal of the Royal Statistical So-
ciety B 32 (1), 115{122.
Stephens, M. A. (1986). Tests based on EDF statistics. In R. B. D'Agostino and
M. A. Stephens (Eds.), Goodness-of-Fit Techniques, Volume 68 of STATIS-
TICS: Textbooks and Monographs, pp. 97{193. New York: Marcel Dekker.
Tew k, A. H. and M. Kim (1992). Correlation structure of the discrete wavelet
coecients of fractional Brownian motion. IEEE Transactions on Information
Theory 38 (2), 904{909.
Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. IEEE Pro-
ceedings 70 (9), 1055{1096.
202
Wichern, D. W., R. B. Miller, and D.-A. Hsu (1976). Changes of variance in rst-
order autoregressive time series models { with an application. Applied Statis-
tics 25 (3), 248{256.
Wickerhauser, M. V. (1994). Adapted Wavelet Analysis from Theory to Software.
Wellesley, Massachusetts: A K Peters, Ltd.
Wornell, G. W. (1993). Wavelet-based representations for the 1=f family of fractal
processes. Proceedings of the IEEE 81 (10), 1428{1450.
Wornell, G. W. (1996). Signal Processing with Fractals: A Wavelet Based Approach.
New Jersey: Prentice Hall.
Appendix A
FOURIER THEORY AND FILTERING
Wavelet methodology shares the basic goals of its Fourier cousin, to transform
signals into a dierent domain so that interesting features may be brought to the
surface. This is done by using basis functions that dier from the sines and cosines
utilized by the discrete Fourier transform (DFT). Having been developed much later,
the notation and concepts of wavelet methodology borrow a great deal from the well
established elds of ltering and Fourier analysis. We will now outline basic concepts
and notation which will be used over and over in this dissertation.
with which to analyze a time series of observations, the discrete wavelet transform
(DWT) uses a formula similar to Equation (A.1) that utilizes sequences which dier
fundamentally from the complex exponentials.
The inverse DFT (synthesis) of fxtg is given by
Z1
xt = A(f )ei2ft df t = 0 1 2 : : : :
;1
The integral is non-zero only when t = t0, thus the inverse DFT is established. The
relationship between fxtg and X () is summarized by calling them a Fourier transform
pair using the notation
fxtg ! X ():
xt; ! e;i2f X (f ):
The converse, that is, a shift in frequency in the DFT of a sequence is equivalent
to a multiplication of a phase factor to that sequence, follows from a similar
argument applied to the inverse DFT,
Z 1
2
Z 1
2
1 A(f ; )ei2ft df = 1 A(f 0)ei2(f 0+
)t df 0
;2 ;2
1
2
Z
= ei2
t 1 A(f 0)ei2f 0t df 0 = ei2
txt:
;2
x
yt ! X (f )Y (f ): (A.2)
207
The proof is seen by applying the de nitions to the left-hand side of Equa-
tion (A.2) and evaluating the resulting expression
X
1 X
1
!X
1
!
x
yte;i2ft = xuyt;u e;i2ft
t=;1 t=;1 u=;1
X
1 X
1
= xu yt;ue;i2ft
u=;1 t=;1
X
1 X
1
= xu y e;i2f ( +u)
!X
u=;1
1
=;1
!! X
1
!
= xue;i2fu y e;i2f = X (f )Y (f ):
u=;1 =;1
A convolution of the DFTs results in the multiplication of the original sequences,
xt yt ! X
Y (f ):
A proof of this statement is similar to that of Equation (A.2).
where fxtg and fytg are square-summable sequences with fxtg ! X () and
fytg ! Y (). If we let xt = yt, then we have Parseval's relation
X
1 Z 1
2
jxtj =
2
1 jX (f )j2 df:
t=;1 ;2
The proof of Equation (A.3) is done by substituting the de nition of the DFTs
X () and Y () in Equation (A.3) and evaluating the resulting integral
Z 1
2
Z 21 ! X
1
!! X
1
!
X (f )Y (f ) df = xte;i2ft i2ft0
yt0 e df
1 1
;2 ;2 t=;1 t0 =;1
X X
1 1 Z 21 X
1
= xtyt0
1
ei2f (t0;t) df = xtyt
t=;1 t0=;1 ;2 t=;1
since the integral of the modulated complex exponential is one if t = t0 and zero
otherwise.
208
H (f ) = jH (f )jei (f )
where jH (f )j is the gain function and (f ) is the phase function for the lter. We
will see squared gain functions for several wavelet lters in Section 3.1.
Often is the case where not one, but several lters are used to analyze a sequence.
A cascade of lters (reference?) is a series of J lters such that the output from the
rst lter is the input to the second lter and so on. If fhjt j t = 0 1 2 : : : g,
j = 1 : : : J , are a series of lters with transfer functions Hj (), then the output from
the cascade of lters can be expressed as
X
1
yt = hu xt;u t = 0 1 2 : : :
u=;1
209
where fhtg is the equivalent lter for the cascade whose transfer function is given by
YJ
H (f ) Hj (f ): (A.4)
j =1
The output from the rst lter fh1tg has DFT H1(f )X (f ). After applying the J
lters, the DFT of fytg is Y (f ) H1(f )H2 (f ) HJ (f )X (f ). Using Equation (A.4)
and the convolution property of the Fourier transform, Y (f ) = H (f )X (f ) and, there-
fore, fytg is simply the convolution of fxtg with fhtg.
Appendix B
UNIVARIATE SPECTRAL ANALYSIS
B.1 Introduction
As with the Fourier transform, we also require concepts from the spectral analysis
of time series in order to better describe and understand wavelet methodology. The
topics described here can be found, using similar notation, within Percival and Walden
(1993) and, with much greater detail, within Priestley (1981).
Let us begin with the spectral representation theorem for a discrete parameter
stationary process. There exists an orthogonal process fZ (f )g de ned on the interval
#;1=2 1=2] such that
ms
Z 1
2
Xt = ei2ft dZ (f ) (B.1)
; 21
for all integers t, where the equality is in the mean square sense. That is, the squared
norm between the left-hand side and right-hand side is zero. We de ne E fjdZ (f )j2g
dS (I )(f ) for all jf j 1=2, and call S (I )() the integrated spectrum of fXtg. For our
purposes here, we will assume the integrated spectrum is dierentiable everywhere
with derivative S (), so that
The autocovariance sequence (acvs) of a stationary process fXtg, with zero mean,
can be written as
Z 1
s E fXtXt+ g =
2
S (f )ei2f df:
; 12
211
The disadvantages of the periodogram are well documented see, for example, (Per-
cival and Walden 1993, p. 197). We will not concern ourselves with such matters,
except to point out the existence of one of several alternative spectral estimators {
the multitaper spectral estimator (Thomson 1982 Percival and Walden 1993, Ch. 7).
We introduce a set of K orthonormal data tapers fhtk j t = 1 : : : N g, where k
P
ranges from 0 to K ; 1 i.e., Nt=1 htj htk = 1 if j = k and 0 if j 6= k. Examples
of common data tapers are the sine tapers (Riedel and Sidorenko 1995) and discrete
prolate spheroidal sequences data tapers (dpss) (Slepian 1978 Thomson 1982 Percival
212
and Walden 1993, Ch. 8). Sine tapers were designed to minimize the spectral window
bias and can be approximated well using the following closed form expression
1 (k + 1)t
= N 2+ 1
2
h(tksine) sin N +1 :
In contrast, the dpss data tapers minimize the spectral window sidelobes, using a
resolution bandwidth parameter W , and must be calculated using techniques such
as inverse iteration, numerical integration or a tridiagonal formulation (Percival and
Walden 1993, Ch. 8). The role of any data taper is to protect against leakage, and all
the sine tapers provide moderate leakage protection where the dpss data tapers oer
adjustable leakage protection through the parameter W . In practice there is little
dierence in the multitaper spectral estimators when using either data taper.
The typical multitaper spectral estimator is given by
N 2
bS (mt)(f ) = 1 X Sbk(mt)(f ) with Sbk(mt)(f ) = X htk Xt e;i2f :
K ;1
K k=0 t=1
Thus, the multitaper spectral estimator is the average of several direct spectral esti-
mators (more speci cally, eigenspectra) using an set of orthonormal data tapers. Mul-
titaper spectral estimators overcome several of the inadequacies of the periodogram
and possess reasonable bias, variance and resolution properties.
where Ch is a constant which depends on the data taper used (see Table 248 in
Percival and Walden (1993) for values of Ch) and Wm() is the smoothing window.
Hence, the quantity is the equivalent degrees of freedom of the spectral estimator
Sb(f ).
Appendix C
BIVARIATE SPECTRAL ANALYSIS
The material presented in following sections closely follows an introduction to
bivariate spectral analysis in Percival (1994), and is a natural extension of univariate
topics found in Percival and Walden (1993) using similar notation. A more thorough
introduction to multivariate spectral analysis can be found in, for example, Koopmans
(1974), Priestley (1981) and Brillinger (1981).
C.1 Introduction
Let fXtg and fYtg be zero-mean weakly stationary processes with spectral density
functions (autospectra) SX () and SY (), respectively. The cross spectral density func-
tion (csdf) of fXt Yt g is de ned to be
X
CXY e;i2f ; 21 f 12
1
SXY (f ) =
=;1
where CXY is the cross covariance sequence (ccvs) given by
CXY = CovfXt Yt+ g = E fXt Yt+ g:
The complete spectral properties of a bivariate time series at frequency f can be
summarized by the spectral matrix
2 3
S ( f ) S (f )
S(f ) 4 X XY 5: (C.1)
SY X (f ) SX (f )
Although this is not a symmetric matrix, there are numerous ways of expressing the
cross-diagonal terms (Brillinger 1981, p. 23) i.e.,
SXY (f ) = SXY
(;f ) = SY X (;f ) = SY X (f ):
215
Thus, the spectral matrix can be expressed in terms of three distinct quantities instead
of four 2 3
S ( f ) S (f )
S(f ) = 4 X XY 5:
SXY (;f ) SX (f )
Whereas the spectrum of a real valued process is real valued, since the autocovari-
ance sequence is symmetric about 0, the csdf (or cross spectrum) is usually complex
valued. This allows us to express SXY () in Cartesian form as
SXY (f ) = RXY (f ) ; iQXY (f )
where RXY () is the co-spectrum and QXY (f ) is the quadrature spectrum. It may also
be expressed in polar notation as
SXY (f ) = AXY (f )eiXY (f )
where AXY (f ) jSXY (f )j is the amplitude spectrum and XY () is the phase spec-
trum. These new functions are at least real valued and may be more easily handled
than the cross spectrum. The complex coherency
wXY (f ) = p SXY (f ) (C.2)
SX (f )SY (f )
depends upon both the cross spectrum and the autospectra for fXtg and fYtg. The
complex coherency is a complex valued frequency domain \correlation coecient."
It measures the correlation in the random amplitudes assigned to the complex ex-
ponentials with frequency f in the spectral representations of fXtg and fYtg. The
quantity jwXY (f )j2 is called the magnitude squared coherence (msc) at the frequency
f . Thus, we have
jwXY (f )j2 = SjS(XY (f )j2 = A2XY (f )
X f )SY (f ) SX (f )SY (f )
that is, the msc is a normalized version of the square of the cross-amplitude spectrum.
The msc captures the \amplitude" part of the cross spectrum, but completely ignores
its phase, so the msc and phase spectrum can be used together to summarize the
\information" in the complex valued cross spectrum.
216
is utilized here to estimate the cross spectrum. The sample cross covariance sequence
is de ned to be
X
CbXY XtYt+
t
where the summation goes from t = 1 to N ;
for
0 and from t = 1 ;
to
N for
< 0. The cross periodogram can also be written in a more computationally
friendly form as
!X
N ! !X
N !
SbXY
(p)
(f ) = N1 Xte;i2ft Yte;i2ft (C.3)
t=1 t=1
where the asterisk denotes complex conjugation.
The multitaper estimator of the cross spectrum is given by
!X
N ! ! X
N !
b(mt)
SXY (f ) = 1 hktXte;i2ft hktYte;i2ft
K t=1 t=1
where fhktg is the kth-order data taper for a sequence of length N normalized such
P
that t h2kt = 1 k = 1 : : : K (c.f. Section B.2). Thus, the multitaper estimators
for the phase spectrum and magnitude squared coherence are given by
(mt) 2
n b(mt) o (mt) 2 SbXY (f )
^(XY
mt)
(f ) = arg SXY (f ) and XY Sb(mt)(f )Sb(mt)(f )
w
^ ( f ) =
X Y
217