You are on page 1of 236

Assessing Nonstationary Time Series Using Wavelets

by
Brandon J Whitcher

A dissertation submitted in partial ful llment


of the requirements for the degree of

Doctor of Philosophy

University of Washington
1998

Approved by
(Chairperson of Supervisory Committee)

Program Authorized
to Oer Degree

Date
In presenting this dissertation in partial ful llment of the requirements for the Doc-
toral degree at the University of Washington, I agree that the Library shall make
its copies freely available for inspection. I further agree that extensive copying of
this dissertation is allowable only for scholarly purposes, consistent with \fair use"
as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of
this dissertation may be referred to University Micro lms, 1490 Eisenhower Place,
P.O. Box 975, Ann Arbor, MI 48106, to whom the author has granted \the right to
reproduce and sell (a) copies of the manuscript in micro lm and/or (b) printed copies
of the manuscript made from micro lm."

Signature

Date
University of Washington

Abstract

Assessing Nonstationary Time Series Using Wavelets


by Brandon J Whitcher

Chairperson of Supervisory
Committee: Professor Peter Guttorp & Professor Donald B. Percival
Statistics & Applied Physics Laboratory

The discrete wavelet transform has be used extensively in the eld of Statistics, mostly
in the area of \denoising signals" or nonparametric regression. This thesis provides
a new application for the discrete wavelet transform, assessing nonstationary events
in time series { especially long memory processes. Long memory processes are those
which exhibit substantial correlations between events separated by a long period of
time.
Departures from stationarity in these heavily autocorrelated time series, such as
an abrupt change in the variance at an unknown location or \bursts" of increased
variability, can be detected and accurately located using discrete wavelet transforms
{ both orthogonal and overcomplete. A cumulative sum of squares method, utilizing
a Kolomogorov{Smirnov-type test statistic is applied to this problem. By analyz-
ing a time series on a scale by scale basis, each scale corresponding to a range of
frequencies, the ability to detect and locate a sudden change in the variance in the
time series is introduced. Using this same procedure to detect a change in the long
memory parameter, when the process variance remains constant, is also briey in-
vestigated. Applications involve Nile River minimum water levels and vertical ocean
shear measurements.
In the atmospheric sciences, broadband features in the spectrum of recorded time
series have been hypothesized to be nonstationary events e.g., the Madden{Julian
oscillation. The Madden{Julian oscillation is a result of large-scale circulation cells
oriented in the equatorial plane from the Indian Ocean to the central Paci c. The
oscillation has been noted to have higher frequencies during warm events in El Ni~no{
Southern Oscillation (ENSO) years. The concepts of wavelet covariance and wavelet
correlation are introduced and applied to this problem as an alternative to cross-
spectrum analysis. The wavelet covariance is shown to decompose the covariance
between two stationary processes on a scale by scale basis. Asymptotic normality of
estimators of the wavelet covariance and correlation is shown in order to construct
approximate con dence intervals. Both quantities are generalized into the wavelet
cross-covariance and cross-correlation in order to investigate possible lead/lag rela-
tions in bivariate time series on a scale by scale basis.
Atmospheric measurements (such as station pressure and zonal wind speeds) from
a single station at Canton Island (2.8 S, 171.7 W) are put through a wavelet analysis
of covariance and are shown to provide similar results to those found in Madden and
Julian (1971) and multitaper spectral techniques. To investigate the possible inter-
action between ENSO activity and the Madden{Julian oscillation, a daily \South-
ern Oscillation Index" and station pressure series collected from Truk Island (7.4 N,
151.8 W) are analyzed. The wavelet cross-covariance nicely decomposes the usual
cross-covariance into scales which are more easily associated with atmospheric phe-
nomena. The time-varying wavelet variance and covariance are used to investigate
possible seasonal eects and changes due to ENSO activity.
TABLE OF CONTENTS

List of Figures vi
List of Tables x
Chapter 1: Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Detecting Nonstationary Events in Long Range Dependence . 2
1.1.2 Wavelet Analysis of Bivariate Time Series . . . . . . . . . . . 3
1.2 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 2: Long Memory Processes 9


2.1 Fractional Dierence Processes . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Generalized Fractional Dierence Processes . . . . . . . . . . . . . . . 13
2.2.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Chapter 3: Discrete Wavelet Transforms and the Wavelet Variance 22


3.1 Wavelet Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.2 Daubechies Families of Wavelet Filters . . . . . . . . . . . . . 24
3.2 The Partial Discrete Wavelet Transform . . . . . . . . . . . . . . . . 31
3.2.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 The Maximal Overlap Discrete Wavelet Transform . . . . . . . . . . . 33
3.3.1 Comparison with the DWT . . . . . . . . . . . . . . . . . . . 33
3.3.2 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Wavelet Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.2 Equivalent Degrees of Freedom . . . . . . . . . . . . . . . . . 37

Chapter 4: Testing Homogeneity of Variance 47


4.1 Spectral Analysis of DWT Wavelet Coecients . . . . . . . . . . . . 49
4.1.1 Long Memory Processes . . . . . . . . . . . . . . . . . . . . . 49
4.1.2 Short Memory Processes . . . . . . . . . . . . . . . . . . . . . 54
4.1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Normalized Cumulative Sum of Squares Test Statistic . . . . . . . . . 59
4.2.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 Data Analytic Thresholding . . . . . . . . . . . . . . . . . . . 61
4.3 Testing Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Testing for a Single Variance Change . . . . . . . . . . . . . . . . . . 65
4.4.1 Empirical Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.2 Empirical Power . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Locating a Single Variance Change . . . . . . . . . . . . . . . . . . . 73
4.5.1 Auxiliary Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5.2 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . 74
ii
4.5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Testing for Multiple Variance Changes . . . . . . . . . . . . . . . . . 77
4.6.1 Iterated Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.2 Empirical Power . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.3 Locating Multiple Variance Changes . . . . . . . . . . . . . . 81
4.6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7 Testing for a Change in the Long Memory Parameter . . . . . . . . . 91
4.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Chapter 5: Wavelet Analysis of Covariance 96


5.1 De nition of the Wavelet Covariance . . . . . . . . . . . . . . . . . . 97
5.1.1 Decomposition of Covariance . . . . . . . . . . . . . . . . . . 97
5.1.2 Wavelet Correlation . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2 Estimating the Wavelet Covariance . . . . . . . . . . . . . . . . . . . 104
5.2.1 The MODWT Estimator . . . . . . . . . . . . . . . . . . . . . 105
5.2.2 The DWT Estimator . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.3 Estimating the Wavelet Cross-Covariance . . . . . . . . . . . . 114
5.3 Estimating the Wavelet Correlation and Cross-Correlation . . . . . . 115
5.4 Con dence Intervals for the Wavelet Covariance and Correlation . . . 119
5.4.1 Wavelet Covariance . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4.2 Wavelet Correlation . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Comparison of Variance Estimators for the Wavelet Covariance . . . . 122
5.5.1 First Moment Properties of Ve j . . . . . . . . . . . . . . . . . . 123
5.5.2 First Moment Properties of Vej . . . . . . . . . . . . . . . . . . 124
5.5.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 127
iii
5.5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Chapter 6: Applications 133


6.1 Nile River Minimum Water Levels . . . . . . . . . . . . . . . . . . . . 135
6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.1.2 Wavelet Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.1.3 Testing for Homogeneity of Variance . . . . . . . . . . . . . . 140
6.2 Vertical Ocean Shear Measurements . . . . . . . . . . . . . . . . . . . 142
6.3 Wavelet and Multitaper Spectral Analysis of the Madden{Julian Os-
cillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3.2 Univariate Spectral Analysis . . . . . . . . . . . . . . . . . . . 151
6.3.3 Bivariate Spectral Analysis . . . . . . . . . . . . . . . . . . . . 155
6.3.4 Wavelet Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.4 Wavelet Analysis of Covariance Between the Southern Oscillation In-
dex and Madden{Julian Oscillation . . . . . . . . . . . . . . . . . . . 166
6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.4.2 Time-Domain and Spectral Analysis . . . . . . . . . . . . . . 168
6.4.3 Wavelet Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.4.4 Investigating Seasonal Variation in the Madden{Julian Oscilla-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.4.5 Investigating ENSO Variation of the Madden{Julian Oscillation 179

Chapter 7: Conclusions and Future Directions 184


7.1 Distributional Results for Testing Homogeneity of Variance . . . . . . 184
7.2 The Schwarz Information Criterion . . . . . . . . . . . . . . . . . . . 184
iv
7.3 Re nement of the Multiple Variance Change Testing Procedure . . . 186
7.4 Testing Homogeneity of Covariance . . . . . . . . . . . . . . . . . . . 187
7.5 Equivalent Degrees of Freedom for the Wavelet Covariance . . . . . . 189
7.6 Assessing Non-Gaussian/Non-Linear Processes . . . . . . . . . . . . . 190
7.7 Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Bibliography 192
Appendix A: Fourier Theory and Filtering 204
A.1 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . 204
A.2 Properties of the DFT . . . . . . . . . . . . . . . . . . . . . . . . . . 205
A.3 Filtering of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Appendix B: Univariate Spectral Analysis 210
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
B.2 Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
B.3 Equivalent Degrees of Freedom for a Spectral Estimator . . . . . . . . 212
Appendix C: Bivariate Spectral Analysis 214
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
C.2 Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

v
LIST OF FIGURES

2.1 Spectral densities for fractional dierence processes . . . . . . . . . . 12


2.2 Realizations of fractional dierence processes . . . . . . . . . . . . . . 14
2.3 Autocovariance sequences for MA(q) approximations to fractional dif-
ference processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Autocovariance sequences for MA(q) approximations, using the modi-
ed innovations variance ^ 2, to fractional dierence processes . . . . . 18
2.5 Realizations of generalized fractional dierence processes . . . . . . . 20
3.1 The Haar, D(4) and LA(8) wavelet lters . . . . . . . . . . . . . . . . 28
3.2 Squared gain functions for the Haar, D(4) and LA(8) wavelet lters . 30
3.3 Quantile-quantile plots for the MODWT wavelet variance . . . . . . . 41
3.4 Cumulative distribution functions for the MODWT wavelet variance . 43
4.1 Theoretical spectra for the unit scale DWT wavelet coecients of frac-
tional dierence processes . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Theoretical spectra for the unit scale DWT wavelet coecients of an
AR(1) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Theoretical spectra for the unit scale DWT wavelet coecients of an
MA(1) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Rejection rates for fractional dierence processes using white noise
critical levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Rejection rates for fractional dierence processes using asymptotic crit-
ical levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
vi
4.6 Rejection rates for fractional dierence processes using the MODWT
and equivalent degrees of freedom . . . . . . . . . . . . . . . . . . . . 70
4.7 Estimated locations of a single variance change at k = 100 for fractional
dierence processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.8 Estimated locations of variance change for fractional dierence pro-
cesses using the iterated cumulative sum of squares procedure . . . . 82
4.9 Estimated locations of multiple variance changes for fractional dier-
ence processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.10 Spectra of fractional dierence processes and octave bands of the dis-
crete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1 Estimates of Vej and Ve j  j = 1 : : :  6 for uncorrelated white noise
processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Estimates of Vej and Ve j  j = 1 : : :  6, minus their true value, for pro-
cesses which satisfy a linear regression with delay . . . . . . . . . . . 131
6.1 Nile River minimum water levels for 622 AD to 1284 AD . . . . . . . 136
6.2 Multiresolution analysis of the Nile River minimum water levels using
the D(4) wavelet lter and MODWT . . . . . . . . . . . . . . . . . . 137
6.3 Estimated D(4) wavelet variances for the Nile River minimum water
levels before and after the year 722 AD . . . . . . . . . . . . . . . . . 139
6.4 Normalized cumulative sum of squares from the MODWT for the Nile
River minimum water levels . . . . . . . . . . . . . . . . . . . . . . . 141
6.5 Plot of vertical shear measurements (inverse seconds) versus depth
(meters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.6 Multiresolution analysis of the vertical ocean shear measurements . . 144
6.7 Estimated wavelet variance of the vertical ocean shear measurements 145
vii
6.8 Estimated locations of variance change for the vertical ocean shear
measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.9 Climate stations in the tropical Paci c Ocean . . . . . . . . . . . . . 149
6.10 Atmospheric time series collected from Canton Island (2:8 S, 171:7 W)
over the period 1 June 1957 to 31 March 1967 . . . . . . . . . . . . . 150
6.11 Univariate spectral analysis of Canton Island data . . . . . . . . . . . 154
6.12 Estimated co-spectra for the Canton Island data . . . . . . . . . . . . 156
6.13 Mean squared coherence of the Canton Island data . . . . . . . . . . 157
6.14 Multiresolution analysis of atmospheric time series collected at Canton
Island (2:8 S, 171:7 W) . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.15 MODWT estimated wavelet variance for Canton Island time series . . 163
6.16 MODWT estimated wavelet correlation for Canton Island time series 164
6.17 Station pressure series for Truk Island(7.4 N, 151.8 W) and the South-
ern Oscillation Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.18 Estimated cross-correlation sequence for the Southern Oscillation In-
dex and Truk Island station pressure series. . . . . . . . . . . . . . . 169
6.19 Multiresolution analysis for the Truk Island station pressure series
(1957{1992) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.20 Multiresolution analysis for the daily Southern Oscillation Index (1957{
1992) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.21 MODWT estimated wavelet variance for the Southern Oscillation In-
dex and Truk Island station pressure series. . . . . . . . . . . . . . . 173
6.22 MODWT estimated wavelet correlation for the Southern Oscillation
Index and Truk Island station pressure series. The transformed con -
dence intervals were computed using Section 5.4.2. . . . . . . . . . . . 174
viii
6.23 MODWT estimated wavelet cross-correlation for the Southern Oscil-
lation Index and Truk Island station pressure series . . . . . . . . . . 175
6.24 Time-varying wavelet variance for the Truk Island station pressure
series and SOI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.25 Indicator of ENSO activity . . . . . . . . . . . . . . . . . . . . . . . . 180
6.26 Time-varying wavelet quantities, for the scale associated with the MJO 181
6.27 Time-varying wavelet quantities, for the scale associated with shorter
periods than the MJO . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.1 Quantile-quantile plot comparing the Monte Carlo distributions of D
and DXY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

ix
LIST OF TABLES

3.1 Scaling coecients for the Daubechies least asymmetric wavelet lter
of length L = 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Equivalent degrees of freedom for the MODWT of white noise . . . . 39
3.3 Large sample approximation to the ratio of equivalent degrees of free-
dom j =N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1 Maximum dynamic range for the spectra of DWT wavelet coecients
when applied to fractional dierence processes . . . . . . . . . . . . . 53
4.2 Maximum dynamic range for the spectra of DWT wavelet coecients
when applied to AR(1) and MA(1) processes . . . . . . . . . . . . . . 55
4.3 Monte Carlo critical values for the test statistic (N=2) 12 D . . . . . . . 61
4.4 Performance of the cumulative sum of squares method for fractional
dierence processes with one variance change . . . . . . . . . . . . . . 72
4.5 Empirical power of iterated CSS algorithm for fractional dierence
processes with one variance change . . . . . . . . . . . . . . . . . . . 79
4.6 Empirical power of the iterated CSS algorithm for fractional dierence
proccesses with two variance changes . . . . . . . . . . . . . . . . . . 80
4.7 Rejection rates for a change in the long memory parameter of a frac-
tional dierence process . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.1 Variance of ^XY (j ) j = 1 : : :  6, for two white noise time series
associated via linear regression with delay . . . . . . . . . . . . . . . 113
x
5.2 Empirical bias and mean squared error of Vej  j = 1 : : :  6 for uncor-
related white noise processes . . . . . . . . . . . . . . . . . . . . . . . 128
5.3 Empirical bias and mean squared error of Vej  j = 1 : : :  6 for white
noise processes which are related via linear regression with delay . . . 130
6.1 Results of testing the Nile River minimum water levels for homogeneity
of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

xi
ACKNOWLEDGMENTS

I would like to thank those people most directly involved with this disser-
tation. My two principal advisors, Professors Peter Guttorp and Don Percival,
guided me and never stopped demanding a high level of my understanding and
of my work. I would also like to thank the other members of my committee:
Professors Paul Sampson, Chris Bretherton, and Stephen Majeski.
I would especially like to thank my parents, Dona Farsdahl and Den-
nis Whitcher. Their willingness to provide me with every resource possible
throughout my life in order to succeed is the reason why I am completing this
degree.
Finally, I would like to thank my fellow graduate students for good times,
stimulating conversations and unparalleled drinking.

xii
Chapter 1
INTRODUCTION
1.1 Motivation
The analysis of time series has often been dicult when data do not conform to well-
studied theoretical concepts. One of the most common statistical properties violated
by time series data is stationarity. A time series is considered (weakly or second-order)
stationary when it has a mean and autocovariance sequence that do not vary with
time. It is not uncommon to encounter departures from stationarity in recorded time
series from the physical sciences, e.g., atmospheric science. There, seasonal eects
are not limited to the mean of a time series, but may also enter into the variance.
Some atmospheric variables are known, for instance, to exhibit increased variability
in the winter of each year. Other time series exhibit a persistence of correlation much
longer than can be explained by short memory (ARIMA) models they are known
as long memory processes. The existence of data, such as these, that defy current
statistical methods motivates researchers to develop better theories and better tools
with which to analyze them. In this dissertation I present statistical techniques
that can be useful for detecting and evaluating nonstationary events in univariate
or bivariate time series. A complicating factor in many situations is the presence of
slowly decaying autocorrelations, or long memory, in a time series. The techniques
presented here are shown to perform well whether short memory or long memory
structure is assumed.
Another concept which arises in the physical sciences is the notion of `multiscale
2

features.' That is, an observed time series may contain several phenomena, each
occurring in dierent time scales (these correspond to ranges of frequencies in the
Fourier domain). An example in atmospheric science would be that weather has a
very short time scale, around three days, while seasonal patterns occur around 365
days when measured at a single station away from the equator. Wavelet techniques
possess a natural ability to decompose time series into several sub-series which may
be associated with particular time scales. Hence, interpretation of features in complex
atmospheric time series may be alleviated by rst applying a wavelet transform and
subsequently interpreting each individual sub-series.
This dissertation grew out of a project to investigate atmospheric phenomena, such
as the Madden{Julian oscillation (MJO) (Madden and Julian 1971), using wavelet
techniques. While developing sound statistical quantities and tests, I have also tried
to keep in mind their application to relevant scienti c questions and interpretability.

1.1.1 Detecting Nonstationary Events in Long Range Dependence


The rst topic I consider is the detection and location of nonstationary events in
time series which may exhibit long memory structure. Here I fuse two established
techniques, wavelet analysis and change-point analysis, in order to extend our ability
to test hypotheses concerning the homogeneity of variance for a univariate time series,
with somewhat mild restrictions on its underlying spectrum.
First, change point detection is a well studied eld in statistics. Detecting a
change in variance has a much smaller amount of literature associated with it. Tech-
niques include, but are not restricted to, Fourier methods (Nuri and Herbst 1969),
cumulative sum of squares methods (Wichern et al. 1976 Hsu 1977), parametric
time series methods (Davis 1979 Tsay 1988), and Bayesian methods (Abraham and
Wei 1984). The cumulative sum of squares method is closely related to the notion
of testing using the empirical distribution function (Stephens 1970 Stephens 1986)
and the cumulative periodogram test see, e.g., Priestley (1981, Sec. 6.1.4). Recently,
3

researchers have investigated detecting and locating not single changes of variance,
but multiple changes. Techniques used include a cumulative sum of squares method
(Inclan and Tiao 1994) and an information criterion method (Chen and Gupta 1997).
Second, the discrete wavelet transform (DWT) has been shown to approximately
decorrelate time series with long memory structure see, for example, Tew k and
Kim (1992), McCoy and Walden (1996) and Wornell (1996). In fact, the DWT of a
long memory process produces several sub-series which are approximately white noise
sequences. Features which dier from this long memory structure, such as sudden
changes of variance, are retained in certain sub-series of wavelet coecients.
We take advantage of this approximate \decorrelation" of the DWT and the sim-
plicity of the cumulative sum of squares method to test for homogeneity of variance,
on a scale by scale basis, of long memory processes in Chapter 4. This provides a
statistically sound technique of testing for nonstationary features without knowing
the exact nature of the correlation structure in a given time series. The methodol-
ogy developed in Chapter 4 is applied to the minimum water levels of the Nile River
(Toussoun 1925) in Section 6.1, a time series known to exhibit long-range dependence
(Mohr 1981 Graf 1983 Beran 1994, p. 22). I also analyze measurements of vertical
ocean shear (Percival and Guttorp 1994) in Section 6.2. While this series does not
appear to exhibit long memory structure, it is a good application of the multiple
change point testing procedure where exact knowledge of the underlying spectrum is
not required. The residual correlation in the wavelet coecients of both short and
long memory processes is investigated in Section 4.1.

1.1.2 Wavelet Analysis of Bivariate Time Series


Atmospheric phenomena are not always discovered using solely univariate techniques.
For example, the Madden{Julian oscillation (MJO) (Madden and Julian 1971) was
found using bivariate spectral analysis { speci cally the co-spectrum and magnitude
squared coherence. This oscillation has been documented as having a period anywhere
4

from 30{60 days and has appeared in many studies in the Indian Ocean and tropical
Paci c Ocean see, e.g., Madden and Julian (1994) for a review. This apparent
broadband nature of the oscillation has been hypothesized as being nonstationary,
so the broad peak observed in previous spectral analyses might be attributed to
the fading in-and-out of the oscillation over the time series of measurements. The
inuence of El Ni~no{Southern Oscillation (ENSO) events has also been hypothesized
to aect the period of the MJO (Gray 1988 Kuhnel 1989). The ability of the wavelet
transform to capture variability in both time and scale may provide insight into the
nature of atmospheric phenomena such as the MJO, but rst bivariate techniques
must be developed.
Wavelet methods for time series analysis have been performed primarily on uni-
variate processes { with the following exceptions. There has been some work in the
eld of turbulence { in a thesis by Hudgins (1992) and subsequent paper by Hud-
gins, Friehe, and Mayer (1993). Hudgins used the output from wavelet transforms
to measure association between turbulent velocity components in the atmosphere.
A few articles also appear in the engineering literature from Japan. Kawata and
Arimoto (1996) were interested in signal matching for pattern recognition problems,
and Li and Nozaki (1997) used the wavelet cross-correlation of two velocity signals
in order to reveal similar structures on a scale by scale basis at particular delays and
times. Recently, Torrence and Compo (1998) discuss the cross-wavelet spectrum,
which is complex valued, and the cross-wavelet power, which is simply the magni-
tude of their cross-wavelet spectrum. They also introduce con dence intervals for
their cross-wavelet power and compare the Southern Oscillation Index (SOI) with
the Ni~no3 sea surface temperature (SST). Both time series are measures of ENSO
activity the SOI is de ned to be seasonally averaged pressure dierence between
Darwin, Australia, and Tahiti, French Polynesia, and the Ni~no3 SST is the seasonal
SST averaged over the central Paci c (5 S{5 N, 90 {150 W).
The articles discussed above solely utilized the continuous wavelet transform.
5

Lindsay, Percival, and Rothrock (1996) de ned the sample wavelet covariance for
the DWT and maximal overlap DWT (a redundant version of the DWT), along with
con dence intervals based on large sample results. These methods were applied to
the surface temperature and albedo of ice pack in the Beaufort Sea, o the coast of
Alaska and the Northwest Territory.
I introduce the wavelet covariance and correlation in Chapter 5, establishing their
asymptotic distributions for certain Gaussian processes. The wavelet covariance is
shown to decompose the covariance between two stationary processes on a scale by
scale basis. The wavelet cross-covariance and cross-correlation are also de ned in
order to perform a more thorough scale by scale analysis of bivariate time series.
The same time series used by Madden and Julian (1971) are analyzed using bivariate
wavelet techniques and multitaper spectral methods in Section 6.3. A daily Southern
Oscillation Index is used as an indicator of ENSO activity and compared with the
station pressure at Truk Island (7.4 N, 151.8 W) in order to investigate the possible
relationship between ENSO events and the MJO (Section 6.4).

1.2 Outline of Thesis


Fractional dierence processes and generalized fractional dierence processes, which
have a time-varying long memory parameter, are introduced in Chapter 2. Descrip-
tions of simulation methods are given for both types of processes, with a slight mod-
i cation to the simulation of generalized fractional dierence processes as proposed
by Wang et al. (1997). Realizations are also provided for both types of processes.
Chapter 3 begins by introducing Daubechies families of compactly supported
wavelet lters. Material related to the Fourier transform and ltering are provided
in Appendix A. The DWT and maximal overlap DWT (MODWT) are then intro-
duced, with references provided for implementation in practice. A key property of
wavelet transforms is their conservation of energy. The wavelet variance is de ned
6

to establish the decomposition of variance for a time series. An equivalent degrees of


freedom argument for the wavelet variance is also investigated.
Chapter 4 discusses testing for homogeneity of variance in univariate time series.
First, the spectral properties of wavelet coecients from both short memory and
long memory processes are investigated. The ability of the DWT to approximately
decorrelate long memory processes with respect to a cumulative sum of squares test
statistic, on a scale by scale basis, is shown via Monte Carlo simulation. The alter-
native hypothesis of a single variance change is then investigated. The MODWT is
employed to estimate the location of the variance change. Both the detection and
location procedures are then applied to multiple variance changes in a time series
using a recursive procedure. The ability of this method to detect a bona de change
in the long memory parameter, when the variance of the process remains constant, is
also studied. A sudden change in the long memory parameter will produce changes
in the autocovariance sequence of the wavelet coecients, at almost all levels of the
DWT, which should aect the cumulative sum of squares test statistic in quite a
dierent way from a simple variance change.
The extension of wavelet methodology to bivariate time series is explored in Chap-
ter 5. By de ning the wavelet covariance and wavelet correlation between two pro-
cesses in a natural way, we can succinctly describe their relationship on a scale by
scale basis. The wavelet covariance is shown to decompose the covariance between
two stationary processes on a scale by scale basis. Asymptotic normality of the
wavelet covariance and correlation is proven, allowing for construction of approx-
imate con dence intervals for their estimators. The wavelet cross-covariance and
cross-correlation are also introduced. The DWT estimator of the wavelet covariance
is shown to suer from bias depending on the delay between the two time series.
Chapter 6 contains examples of how the methodology introduced in Chapters 4
and 5 perform on real time series. The Nile River minimum water levels (Toussoun
1925) are analyzed to nd a sudden change of variance around 720 AD, which cor-
7

responds nicely to the construction of an instrument to measure the river levels in


715 AD. Vertical ocean shear measurements (Percival and Guttorp 1994) are analyzed
to determine multiple variance changes in the rst 5 scales. Atmospheric time series
from the tropical Paci c Ocean (provided by Rol Madden at the National Center for
Atmospheric Research) are analyzed in order to contrast the results from a wavelet
analysis of these data to results obtained from bivariate spectral analysis in Madden
and Julian (1971). Two series, one a measure of ENSO and the other for the MJO,
are analyzed in order to investigate the possible association between ENSO events
and the frequency and/or magnitude of the MJO.
Conclusions from the research presented here are given in Chapter 7, along with
open questions and future directions.

1.3 Contributions
The following is a list of original contributions in this dissertation:

Investigation of the spectral properties of the DWT wavelet coecients when


applied to both short memory (ARMA) and long memory (fractional dierence)
processes (Section 4.1).

Demonstration, through Monte Carlo simulation, that the DWT of fractional


dierence processes produces approximately uncorrelated output, on a scale by
scale basis, with respect to a Kolmogorov-type test statistic (Chapter 4).

Proof that the wavelet covariance decomposes the covariance between two sta-
tionary processes on a scale by scale basis (Section 5.1.1).

Proof that the MODWT estimator of the wavelet covariance is asymptotically


normally distributed when applied to nonstationary Gaussian processes whose
dth order backward dierences are short memory stationary (Section 5.2.1).
8

This allows for the construction of con dence intervals when estimating the
wavelet covariance.

Proof that the MODWT estimator of the wavelet correlation is asymptotically


normally distributed when applied to nonstationary Gaussian processes whose
dth order backward dierences are short memory stationary (Section 5.3). This
allows for the construction of con dence intervals when estimating the wavelet
correlation.

Demonstration that the lack of shift invariance of the DWT introduces bias into
the variance of the DWT estimator of the wavelet covariance (Section 5.2.2).

Demonstration of evidence for a change in the variance of the Nile River min-
imum water levels (Toussoun 1925) instead of a change in the long memory
parameter { as proposed in Beran and Terrin (1996) (Section 6.1).

Investigation of the possible interaction between ENSO events and the MJO us-
ing a wavelet analysis of covariance developed in this dissertation (Section 6.4).
Chapter 2
LONG MEMORY PROCESSES
Our current understanding, and more importantly awareness, that natural phe-
nomena may exhibit long-range dependence is due to the pioneering work by Hurst
(1951). While looking at time series from the physical sciences (e.g., rainfall, tree
rings, river levels, etc.) he noticed that his R=S -statistic, on a logarithmic scale, was
randomly scattered around a line with slope H > 21 for large sample sizes. The R=S -
statistic is the rescaled adjusted range and was used to calculate the ideal capacity of
a water reservoir from time t to time t + k. Loosely, the numerator R (or adjusted
range) measures the cumulative inow to the reservoir and the denominator S is pro-
portional to the standard deviation of all measured inows. For a stationary process
with short-term dependence, log R=S should be proportional to k 21 , for k large. The
discovery of slopes proportional to kH , with H > 21 , was in direct contradiction to
the theory of such processes at the time. This discovery is known as the Hurst eect.
Mandelbrot and co-workers (Mandelbrot and van Ness 1968 Mandelbrot and Wal-
lis 1969) showed that the Hurst eect can be modeled by fractional Gaussian noise
with self-similarity parameter 0 < H < 1 (H being for Hurst). More information
about the history of long memory processes can be found in Beran (1994). Examples
of such behavior can be found in a variety of disciplines, such as geophysics (Percival
and Guttorp 1994 Walden 1994), hydrology (Lawrence and Kottegoda 1977 Hosking
1984), economics (Jensen 1994) and engineering (Mehrabi, Rassamdana, and Sahimi
1997 Abry and Veitch 1998). In this dissertation, I look at the Nile River mini-
mum water levels (Toussoun 1925) and vertical shear measurements in the ocean in
Chapter 6.
10

This chapter is divided into two parts, fractional dierence processes and general-
ized fractional dierence processes. The latter is a generalization of the former where
the dierence parameter d is allowed to vary with time. Along with brief descriptions,
simulation techniques for both types of processes are also provided.

2.1 Fractional Dierence Processes


2.1.1 Denition
In the early 1980s, a family of models were developed to help analyze long memory
processes. Granger and Joyeux (1980) and Hosking (1981) introduced fractional
ARIMA models, which are a generalization of the standard ARIMA(p d q) models
de ned by Box and Jenkins (1976).
Let fXtg be a stationary process whose dth order backward dierence
X
1
d

(1 ; B )dX  k (;1) Xt;k = t
k
t
k=0
is a stationary process, where d is a real number and

a  a! = ;(a + 1)
b b!(a ; b)! ;(b + 1);(a ; b + 1) :
For example, the rst order dierence is Yt = Xt ; Xt;1 when d = 1. If f tg is a white
noise process with variance 2, then fXtg is the simplest case of a fractional ARIMA
process, a fractional ARIMA(0 d 0). We will refer to such processes as fractional
dierence processes from now on.
Now let fXtg be a zero mean fractional ARIMA(0 d 0) process with ; 21 < d < 21
(for simplicity, the sampling interval is "t = 1). This process is stationary and
invertible (Hosking 1981). The autocovariance sequence (acvs) of fXtg is de ned to
be
s  E fXt Xt; g = ;(1 + (
;;1)d);(1 ; 2d) 
2 
;(1 ;
; d) (2.1)
11

which means the variance is given by


2 ;(1 ; 2d)
VarfXt g = s0 = #;(1 ; d)]2 : (2.2)
The spectral density of fXtg is
SX (f ) = 2j2 sin( f )j;2d for ; 21 < f < 12  (2.3)
so that SX (f ) / f ;2d approximately as f ! 0 and, thus, the spectral density is
approximately linear on the log scale. This property can be seen in Figure 2.1 for
various fractional dierence processes. When 0 < d < 21 , this spectral density has a
pole at zero, in which case the process exhibits slowly decaying autocovariances and
constitutes a simple example of a long memory process.

2.1.2 Simulation
Davies and Harte (1987) describe a method for simulating certain stationary Gaussian
time series of length N with autocovariances 0 1 : : :  N ;1. The method is based
on the Fourier transform and is as follows (Beran 1994, pp. 216{217):

1. De ne
k  2 2(nk;;21) 
k = 1 : : :  2n ; 2, and the discrete Fourier transform ;k of the two-sided
sequence of autocovariances 0 1 : : :  n;2  n;1 n;2  : : :  1,
X
n;1 X
2n;2
;k  j;1 ei(j;1)k + 2n;j;1 ei(j;1)
k
(2.4)
j =1 j =n
for k = 1 : : :  2n ; 2.

2. Check to see that ;k > 0 for all k = 1 : : :  2n ; 2. If this condition does not
hold, the Davies{Harte method will not work for this time series (this is not a
problem with fractional dierence processes).
12

20
d = 0.05
d = 0.25
d = 0.40
d = 0.45
15

10
dB

2^-8 2^-6 2^-4 2^-2

Frequency

Figure 2.1: Spectral densities for fractional dierence processes. The x-axis is dis-
played on the log2 scale.

3. Simulate two independent sequences of normal random variables, U1 U2 : : :  Un


and V1 V2 : : :  Vn, such that
VarfU1g = VarfUn g = 2
and, for k 6= 1 n,
VarfUk g = VarfVk g = 1:
De ne V1 = Vn = 0 and the sequence of complex random variables fZk g by
Zk  Uk + iVk  k = 1 : : :  n
and
Zk  U2n;k ; iV2n;k 
13

for k = n + 1 : : :  2n ; 2.

4. For t = 1 : : :  n de ne
Xp
2n;2
Xt  p 1 ;k ei(t;1) Zk :
k
(2.5)
2 n;1 k=1

The series fXtg has the desired covariance structure. This method has a compu-
tational advantage since Equations (2.4) and (2.5) can be calculated using the fast
Fourier transform. Percival (1992) compares this method to others in the context of
generating a stationary Gaussian process with speci ed spectrum.
S-plus code for the Davies{Harte method, along with documentation provided by
Martin Maechler and Jan Beran, can be obtained via the World Wide Web from
StatLib at http://lib.stat.cmu.edu/S/ under the title beran. Realizations of
length 512 from several fractional dierence processes (generated in S-plus) are dis-
played in Figure 2.2. As the long memory parameter increases in magnitude, the
fractional dierence process appears to have more and more low frequency content.

2.2 Generalized Fractional Dierence Processes


2.2.1 Denition

A process de ned by Equation (2.3) has a long memory parameter which is constant
over time. We introduce a related process where the long memory parameter dt is
a discrete function of time { called a generalized fractional dierence process (gfdp).
This process has recently appeared in a paper by Wang, Cavanaugh, and Song (1997).
We will utilize these processes later on when we investigate how the test for homo-
geneity of variance reacts to a sudden change in the long memory parameter of a
generalized fractional dierence process.
14

d: 0.45

-2

-4
d: 0.40

-2

-4
d: 0.25

-2

-4
d: 0.05

-2

-4

0 100 200 300 400 500

Figure 2.2: Realizations of fractional dierence processes (N = 512).


15

2.2.2 Simulation
Hosking (1981) looked at representing a fractional ARIMA as an in nite autoregres-
sive process or in nite moving average process with coecients which may be given
explicitly see also Beran (1994, pp. 64{65). We utilize the in nite moving average
representation in order to simulate generalized fractional dierence processes. Let
fXt g be a generalize fractional dierence process with long memory parameter fdtg,
then it has an in nite moving average representation
X
1
Xt = atk t+N ;k  t = 1 : : :  N
k=0
where k  t = 1 2 : : : , is a white noise sequence and

atk  ;(;( k + dt) : (2.6)


k + 1);(d ) t

For k ! 1 we have the following approximation

atk  ;(1d ) kd ;1


t

by Stirling's formula.
We now provide an algorithm for simulating such a process. For a realization
Xt  t = 1 : : :  N , of a portion of a generalized fractional dierence process, a white
noise sequence t t = 1 : : :  mN is generated. The parameter m > 1 is a positive
integer that determines the order of the moving-average model used to generate the
realization Xt. When simulating generalized fractional dierence processes in this
dissertation, I used m = 2. Once the length of previous observations is speci ed, each
observation Xt is simply the moving average of the previous (m ; 1)N observations
i.e.,
X
(m;1)N ;1
Xt = atk t;k t = 1 : : :  N: (2.7)
k=0
16

The coecients atk are functions of the time-varying long memory sequence fdtg and
can be de ned recursively via

atk = atk;1 k ; 1k + dt for k = 1 : : :  N ; 1


where at0 = 1. This allows for fast computation since computing the gamma function
in Equation (2.6) explicitly is inecient.
As previously stated, this technique is based on an in nite moving average process
but implemented as a nite moving average process. A simple check to see how large
m needs to be, in order to reasonably simulate fractional dierence processes, is
to compare the autocovariance sequence (acvs) between the true process and the
moving-average approximation. The acvs of an MA(q) process, such as the one given
in Equation (2.7), is
8 2 Pq;jkj
< j=0 atj atj+jkj jkj  q
s(MA)
k = : 0 (2.8)
jkj > q
(see, e.g., Brockwell and Davis (1991, p .79)), where dt does not vary with time. The
exact acvs for a fractional dierence process fs(fdp)
k g is given in Equation (2.1). The
acvs for an MA(q) process, computed using order q = 512 1024 2048 and 4096, was
compared with fs(fdp)
k g for long memory parameters d = 0:05 0:25 0:4 and 0.45 see
Figure 2.3. For fractional dierence processes with small long memory parameters,
the MA(q) approximation is very good. However, when heavy amounts of autocorre-
lation are present even the MA(4096) process does not perform well. This makes the
approximation quite crude with respect to fractional dierence processes as d ! 0:5.
There is one free parameter to help us improve the t between fs(fdp)
k g and fsk g,
(MA)

namely, the innovations variance 2. If we adopt the least-squares approach, then we


want to nd a 2 such that
X h (fdp)
N ;1 i2
sk ; 2s(MA)
k
k=0
17

d: 0.40 d: 0.45
2.0

3.5
3.0
1.5

2.5
2.0
1.0

1.5
1.0
0.5
Autocovariance

0 5 10 15 20 25 30 0 5 10 15 20 25 30

d: 0.05 d: 0.25
1.2
1.0

Exact
q = 512
1.0

q = 1024
0.8

q = 2048
q = 4096
0.8
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30

lag

Figure 2.3: Autocovariance sequences for MA(q) approximations to fractional dier-


ence processes, with orders ranging from q = 512 to q = 4096.
18

d: 0.40 d: 0.45
2.5

5
2.0

4
1.5

3
1.0

2
Autocovariance

0 5 10 15 20 25 30 0 5 10 15 20 25 30

d: 0.05 d: 0.25
1.2
1.0

Exact
q = 512
q = 1024
1.0
0.8

q = 2048
q = 4096
0.8
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30

lag

Figure 2.4: Autocovariance sequences for MA(q) approximations, using the modi ed
innovations variance ^ 2, to fractional dierence processes, with orders ranging from
q = 512 to q = 4096.
19

is minimized. Dierentiating with respect to 2, setting the equation to zero, and


solving for 2 yields the usual estimator from linear regression theory i.e.,
PN ;1 s(fdp)s(MA)
^2  PkN=0;1 hk (MA)k i2 :
k=0 s k

Hence, inserting ^ 2 into Equation (2.8) will produce an improved, in the least-
squares sense, autocovariance sequence. Simulation of fractional dierence processes
is straightforward, by generating sequences of white noise with variance ^ 2 and uti-
lizing Equation (2.7), where fdtg does not vary with time. For generalized fractional
dierence processes, sequences of white noise with unit variance are generated and
the innovations standard deviation ^t enters into Equation (2.7) and varies with
fdt g.
This procedure was applied to the autocovariance sequences displayed in Fig-
ure 2.3, using only the rst 30 lags in the regression, with the results shown in
Figure 2.4. There is no change in the autocovariance sequences when d = 0:05
or 0.25, these approximations were adequate without modi cation even for q = 512.
A marked improvement is seen for d = 0:40, where the approximation at initial lags is
only slightly higher and, for larger lags, is not nearly as low as before. This pattern is
even more apparent for d = 0:45, where the new autocovariance sequences do not t
the exact acvs well, but better than in Figure 2.3. Whereas modifying the innovations
variance improves this method of simulation for a larger interval of d, as d ! 0:5,
any nite moving-average model must have an extremely large order to adequately
capture the amount of correlation structure present in a fractional dierence process.
Figure 2.5 shows realizations of four generalized fractional dierence processes
(N = 512), where m = 2 and the modi ed innovations variance ^ 2 is utilized, with
a sudden shift in the long memory parameter at t = N2 . Whereas we are only in-
terested in generating fractional dierence processes with these sudden changes in
the dierence parameter, this method is able to produce processes with parameters
20

d: 0.05 / 0.45

-2

-4

d: 0.05 / 0.40

-2

-4

d: 0.05 / 0.25

-2

-4

d: 0.05 / 0.05

-2

-4

0 100 200 300 400 500

Figure 2.5: Realizations of generalized fractional dierence processes (N = 512).


21

which change linearly or even nonlinearly with time. Examples of such processes can
be found in Wang et al. (1997).
Chapter 3

DISCRETE WAVELET TRANSFORMS AND THE


WAVELET VARIANCE
The wavelet transform is a powerful mathematical tool that is receiving more and
more attention by the statistical community. While most work is being done in the
engineering and physical sciences, wavelet transforms have already proven useful in
well established statistical elds such as nonparametric regression, classi cation, and
time series analysis. The ground-breaking work of Donoho and co-workers (Donoho
1993 Donoho and Johnstone 1994 Donoho 1995 Donoho, Johnstone, Kerkyacharian,
and Picard 1995, etc.) introduced statisticians to wavelet transforms in the context of
signal estimation and wavelet shrinkage. In the following chapters, I extend wavelet
methodology in the area of time series analysis. This chapter introduces some basic
concepts of wavelet methodology and briey investigates how well the equivalent
degrees of freedom argument holds with respect to the wavelet variance.
Wavelet methods are best understood when contrasted against classical Fourier
methodology. Two texts with introductory material on Fourier theory are Percival
and Walden (1993) and Briggs and Henson (1995). Classic texts on spectral analysis
of time series include Koopmans (1974), Bloom eld (1976) and Priestley (1981)
Anderson (1971) and Fuller (1996) focus on the time-domain analysis of time series.
These texts will be utilized when the details of a particular concept are beyond
the scope of this dissertation. Pertinent concepts, such as Fourier theory, ltering
and spectral analysis, are presented in the rst two appendices at the end of this
dissertation for direct reference. Introductory texts on wavelet theory abound, and
23

most { if not all { are from an engineering perspective, except for the recent work of
Ogden (1997). Chui (1997) provides a basic synopsis of wavelet theory, while Vetterli
and Kova%cevic (1995) gives a more thorough account from an engineering perspective.
The mathematically rigorous book by Daubechies (1992) contains a wealth of details
on her families of wavelet lters.
By utilizing notation and concepts from the rst two appendices, we introduce the
Haar wavelet lter and two families of Daubechies wavelets in Section 3.1. The partial
discrete wavelet transform (DWT) and maximal overlap discrete wavelet transform
(MODWT) are briey introduced. Algorithms for the DWT abound in the litera-
ture for a detailed computer algorithm of the MODWT see Appendix A of Percival
and Mofjeld (1997). The bulk of the background material on wavelets, DWT and
MODWT (including notation) is a synopsis derived from Percival and Walden (1999).
The wavelet variance is introduced along with the concept of equivalent degrees of
freedom for a time series. The distribution of the wavelet variance under the equiv-
alent degrees of freedom argument is compared with exact methods from the theory
of quadratic forms of normal random variables.

3.1 Wavelet Filters

3.1.1 The Haar Wavelet

The rst wavelet lter, the Haar wavelet (Haar 1910), remained in relative obscurity
until the convergence of several disciplines to form what we now know in a broad
sense as wavelet methodology. It is a lter of width L = 2 which can be succinctly
de ned by its scaling coecients

g0 = g1 = p1 
2
24

p p
or equivalently by its wavelet coecients h0 = 1= 2 and h1 = ;1= 2 through the
quadrature mirror relationship

hl = (;1)lgL;1;l for l = 0 : : :  L ; 1 (3.1)

(the convention of gl corresponding to the low-pass or scaling coecients and hl cor-


responding to the high-pass or wavelet coecients will be adhered to throughout
this dissertation). The Haar wavelet is special since it is the only compactly sup-
ported orthonormal wavelet that is symmetric (Daubechies 1992, Ch. 8). It is also
useful for presenting the basic properties shared by all Daubechies wavelet lters {
orthonormality and orthogonality to even shifts. The former property is seen by
X
L;1
h2l = 1 (3.2)
l=0
and the latter through a similar calculation
X
L;1 X
1
hlhl+2k = hlhl+2k = 0 (3.3)
l=0 l=;1
for all non-zero integers k, where by de nition hl = 0 for l < 0 and l L.
Although the Haar wavelet lter is easy to visualize and implement, it is inade-
quate for most real-world applications in that it is a poor approximation to an ideal
band-pass lter. This can be seen, for example, in the analysis of vertical ocean shear
measurements (Percival and Guttorp 1994). Other wavelets of even width L 4 have
been developed in the past decade that yield much better approximations.

3.1.2 Daubechies Families of Wavelet Filters


A wavelet family consists of all wavelet basis vectors, over all scales and translations,
derived from a single wavelet lter (or mother wavelet). Two wavelet families which
will be used exclusively in later chapters were developed by I. Daubechies. They are
the extremal phase and least asymmetric wavelets. When referring to these wavelets
25

in the future, `D(L)' and `LA(L)' will be used to denote Daubechies extremal phase
and least asymmetric wavelet lters of length L, respectively. The D(2) wavelet is
equivalent to the Haar wavelet. In general, let
X
L ;1 X
L;1
H (f )  hte ;i2ft
and G(f )  gte;i2ft
t=0 t=0
de ne the transfer function for the wavelet and scaling coecients, respectively. Re-
call that any arbitrary transfer function may be factored into the product of its
magnitude component and a complex exponential containing the phase component
(cf. Section A.3). The D(L) and LA(L) wavelet lters are identical in the magnitude
of their transfer functions, only diering in their phase properties. The manipula-
tion of these phase properties is known as spectral factorization (Percival and Walden
1999, Sec. 4.8).
Wavelet lter coecients for the D(4) wavelet, at unit scale, are de ned to be
p p p p
h0 = 1 ;p 3  h1 = ;3 p
+ 3  h = ;3 p
2
+ 3 and h = ;1 p
3
; 3
4 2 4 2 4 2 4 2
and the scaling coecients for the LA(8) are given in Table 3.1. Recall that the scaling
lter is related to the wavelet lter via the quadrature mirror lter relationship given
by Equation (3.1). The scaling coecients de ning Daubechies families of wavelet
lters of varying lengths can be found in Daubechies (1992, Ch. 6).
More information about the properties of these wavelets can be seen when com-
paring the squared gain functions of the wavelet and scaling coecients

H(f )  jH (f )j2 = H (f )H  (f ) and G (f )  jG(f )j2 = G(f )G (f )


respectively. The two transfer functions, and hence the squared gain functions, are
related through the quadrature mirror relationship (Equation (3.1)) such that

G(f ) = e;i2f (L;1)H


;1 ; f ;
and hence G (f ) = H 21 ; f :
 (3.4)
2
26

Table 3.1: Scaling coecients for the Daubechies least asymmetric wavelet lter of
length L = 8, taken from Percival and Walden (1999, Sec. 4.4).
g0 = ;0:0757657147893407
g1 = ;0:0296355276459541
g2 = 0:4976186676324578
g3 = 0:8037387518052163
g4 = 0:2978577956055422
g5 = ;0:0992195435769354
g6 = ;0:0126039672622612
g7 = 0:0322231006040713

The orthonormality (Equation (3.2)) and orthogonality to its even shifts (Equa-
tion (3.3)) seen for the Haar wavelet lter, and shared by both Daubechies families of
wavelet lters used here, can be succinctly expressed using the squared gain function
of the wavelet lter via
H(f ) + H ;f + 12  = 2 for all f: (3.5)
To illustrate these properties, we can show the Haar wavelet lter satis es Equa-
tion (3.5) since
X
1 p
hle;i2fl = 1 ;pe
;i2f
H (Haar)(f ) = = i 2e;if sin( f )
l=0 2
and therefore the squared gain function is H(Haar)(f ) = 2 sin2( f ). Using the rela-
; 
tionship that cos( f ) = sin f + 12 , we have
; 
H(Haar)(f ) + H(Haar) f + 21 = 2 sin2( f ) + 2 cos2( f ) = 2:
Alternative ways of expressing Equation (3.5), say, using the squared gain function
for the scaling coecients or combinations between the two, are
G (f ) + G ;f + 21  = 2 or G (f ) + H(f ) = 2 for all f
27

and follow from the fact that they both have unit period and their quadrature mirror
relationship (Equation (3.4)).
Now, the discrete wavelet transform (DWT) can be thought of as a sequence
of ltering operations which form a cascade of lters (cf. Section A.3). The low-
pass output from one ltering operation fXt
gl g is the input to the next ltering
operation where the lter is an upsampled version of the original lter. Upsampling
consists of inserting one zero between each of the elements of fhlg to form fh"l g 
fh0 0 h1 0 : : :  hL;2 0 hL;1g see, e.g., (Vetterli and Kova%cevic 1995, Sec. 2.5.3) or
Percival and Walden (1999, Sec. 4.4). The transfer function for fh"l g is
X
2L;2 X
L;1 X
L;1
H "(f ) = h"l e;i2fl = h"2le;i2f (2l) = hle;i2(2f )l = H (2f )
l=0 l=0 l=0

since every other element of fh"l g is zero. Using Equation (A.4), the transfer func-
tion for the second level wavelet lter fh2lg is H2(f )  H (2f )G(f ). By a simi-
lar argument, the transfer function for the second level scaling lter fg2lg is deter-
mined by convolving fgl g with fgl"g  fg0 0 g1 0 : : :  gL;2 0 gL;1 g and is therefore
G2 (f )  G(2f )G(f ). This method can be extended to an arbitrary level j , by re-
peatedly upsampling the lters and applying Equation (A.4), yielding the following
expressions for the transfer functions of the wavelet and scaling lters, respectively,
Y
j ;2 Y
j ;1
Hj (f )  H (2j;1 f ) G(2l f ) and Gj (f )  G(2lf ): (3.6)
l=0 l=0
Intuitively, a vector of wavelet coecients for level j is composed of j ; 1 applications
of a low-pass lter (or averaging operator) followed by one application of a high-pass
lter (or dierence operator), and a vector of scaling coecients is obtained from j
applications of the low-pass lter.
Figure 3.1 shows some of the common wavelets, or more speci cally, wavelet basis
vectors taken from the sixth level of the transform. As the length of the wavelet lter
increases, the smoothness of the basis function increases. However, the increased
28

Haar

0.2

0.1

0.0

-0.1

D(4)

0.2

0.1

0.0

-0.1

LA(8)

0.2

0.1

0.0

-0.1

0 100 200 300 400 500

Figure 3.1: The Haar, D(4) and LA(8) wavelet lters for level 6 (N = 512).
29

length, while improving the lters' approximation to an ideal band-pass lter, am-
pli es boundary eects encountered whenever nite time series are analyzed. This
is an important feature to realize in practical situations where data may be at a
premium. From the gure, the Haar wavelet lter is a simple square-wave function,
the D(4) is quite jagged with a self-similar or fractal-like appearance to it and the
LA(8) is reasonably smooth and quite close to symmetric. When selecting a wavelet
lter, several factors must be taken into account, such as, boundary eects, leakage
protection, etc. Most importantly, the wavelet lter should agree with the underlying
structure of the physical process it is analyzing.

The squared gain functions of the wavelet and scaling lter coecients for the
Haar, D(4) and LA(8) wavelets are given in Figure 3.2. For comparison, the vertical
dotted lines indicate the passband of frequencies for an ideal band-pass lter. The rst
column in the gure shows the squared gain functions for the unit scale wavelet lters.
As the length of the lter increases, from Haar (L = 2) to LA(8), the approximation
to an ideal high-pass lter for 41 < f < 12 by the wavelet coecients improves as does
the approximation to an ideal low-pass lter by the scaling coecients. The Haar
wavelet lter is seen to be a poor approximation to an ideal band-pass lter for all
scales shown.

Another interesting feature to point out is the leakage of the shorter wavelet lters.
Because a high portion of low frequencies is being captured in each scale, one may
observe a fair amount of low frequency structure at smaller scales see, e.g., Percival
and Guttorp (1994). This is due to the poor approximation to an ideal band-pass
lter by the analyzing wavelet. However, unlike spectral analysis, where leakage
must be dealt with using tapering or other pre-processing of the data, an easy way
to eliminate (or at least suppress) leakage is to increase the length of the wavelet
lter. In practice, it is a good idea to perform a wavelet decomposition using lters
of varying lengths, in order to determine if leakage is present.
30

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

Haar Haar Haar Haar


Level: 1 Level: 2 Level: 3 Level: 4
1.0

0.8

0.6

0.4

0.2

0.0
D(4) D(4) D(4) D(4)
Level: 1 Level: 2 Level: 3 Level: 4
1.0

0.8

0.6

0.4

0.2

0.0
LA(8) LA(8) LA(8) LA(8)
Level: 1 Level: 2 Level: 3 Level: 4
1.0

0.8

0.6

0.4

0.2

0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

Frequency

Figure 3.2: Squared gain functions for the Haar, D(4) and LA(8) wavelet lters
associated with scales 2j ; 1 j = 1 : : :  4. The solid line denotes the wavelet lter
and the dashed line denotes the scaling lter while the vertical dotted lines denote
the frequency bands for an ideal band-pass lter.
31

3.2 The Partial Discrete Wavelet Transform


3.2.1 Denition
Here we introduce notation and concepts in order to compute the partial DWT of a
vector of observations. This section closely follows previous de nitions of the DWT
by, for example, Percival and Mofjeld (1997) and McCoy, Walden, and Percival (1998).
A much more thorough introduction can soon be found in Percival and Walden (1999,
Ch. 4).
Let fh1g  fh10 : : :  h1L;1g denote the wavelet lter coecients of a Daubechies
compactly supported wavelet and let fg1g  fg10 : : :  g1L;1g be the corresponding
scaling lter coecients, de ned via an analogous relationship to Equation (3.1),
speci cally,
g1m = (;1)m+1h1L;1;m :
The wavelet lter fh1g is associated with unit scale and we assume it satis es Equa-
tions (3.2) and (3.3). For any dyadic sample size N L, let
X
N ;1
H1k = h1m e;i2mk=N  k = 0 : : :  N ; 1
m=0
be the discrete Fourier transform (DFT) of fh1g (cf. Equation (A.1)) and let G1k
denote the DFT of fg1g. Now de ne the wavelet lter fhj g for scale j  2j;1 as the
inverse DFT of
Y
j ;2
Hjk = H12 ; k mod N
j 1 G12 k mod N  k = 0 : : :  N ; 1
l

l=0
(cf. Equation (3.6)). The resulting wavelet lter associated with scale j has length
Lj  (2j ; 1)(L ; 1) + 1. Also, de ne the scaling lter fgJ g for scale J as the inverse
DFT of
Y
J ;1
GJk = G12 k mod N  k = 0 : : :  N ; 1
l

l=0
32

(cf. Equation (3.6)).


Let W be an N N matrix de ning a J th order partial orthonormal DWT based
upon a Daubechies wavelet lter of even length L  N . The rows of W consist of
circularly shifted (by multiples of 2) versions of the zero-padded wavelet lters for
scale j , de ned via

hj  # hj0 0 : : :  0 hjL ;1 hjL ;2 : : :  hj2 hj1 ]T 


j j (3.7)

where the non-zero wavelet lter coecients are in reverse order. Constructing a
matrix from all possible circular shifts, at a particular scale j , of Equation (3.7)
yields the sub-matrix Wj . This allows us to think of the orthonormal matrix W
being comprised of several sub-matrices, each one stacked on top of the other, such
that W = #W1 : : :  WJ  VJ ]T . For example, when L = 4 and N > 4 we get
2 3
66 h11 h10 0 0 0 0 0 0 0 0 0 h13 h12 7
66 h13 h12 h11 h10 0 0 0 0 0 0 0 0 0 777
66 0 0 h13 h11 h11 h10 0 0 0 0 0 0 0 777
W1 = 66 .. .. .. .. .. .. . . . ... ... ... ... ... ... ... 77 
66 . . . . . . 77
66 0 0 0 0 0 0 0 h13 h12 h11 h10 0 0 75
4
0 0 0 0 0 0 0 0 0 h13 h12 h11 h10
(3.8)
where W1 is a N=2 N matrix whose rows are h1 circularly shifted by 2m ; 1 for
m = 1 : : :  N=2. The remaining sub-matrices W2 : : :  WJ are de ned similarly to
Equation (3.8), being shifted by 2j m ; 1 for m = 1 : : :  N=2j , and VJ is identical
in dimension to WJ but contains circularly shifted versions of gJ , instead of hJ , by
2J m ; 1 for m = 1 : : :  N=2J . In practice, the rows of the matrix W are not explicitly
constructed, but instead the DWT is implemented via a pyramid algorithm (Mallat
1989) that applies wavelet coecients to the input series and subsamples the output
one scale at a time.
33

When applied to a vector of observations X, the DWT yields N wavelet coe-


cients W = WX, which can be organized into J +1 vectors W = #W1 : : :  WJ  VJ ]T ,
similar to W above, where Wj is a length N=2j vector of wavelet coecients associ-
ated with changes on a scale of length 2j;1 and VJ is a length N=2J vector of scaling
coecients associated with averages on a scale of length 2J .

3.2.2 Analysis of Variance


Like the DFT, orthonormality of the matrix W implies that the DWT is an energy
preserving transform so that kWk2 = kXk2. This can be easily proven through basic
matrix manipulation via
kXk2 = XT X = (WW)T WW = WT W T WW = WT W = kWk2 :
Given the structure of the wavelet coecients, the energy in X is decomposed on a
scale by scale basis via
X
J
kXk =
2 kWj k2 + kVJ k2 
j =1
where kWj k2 is the energy of X due to changes at scale j and kVJ k2 is the energy
due to changes at scales J and higher. This property is exploited in later sections to
de ne the wavelet variance (Section 3.4) and the wavelet covariance and correlation
(Chapter 5).

3.3 The Maximal Overlap Discrete Wavelet Transform


3.3.1 Comparison with the DWT
The DWT is a very useful operation, but does not possess all the attributes which
may be desirable for certain applications. In response to this, an alternative wavelet
transform has been developed { the maximal overlap discrete wavelet transform
(MODWT). The MODWT gives up orthogonality in order to gain other features
34

the DWT does not possess. It does this by not subsampling the ltered output at
each scale. A consequence of this is the wavelet and scaling coecients must be
rescaled in order to retain the energy preserving property of the DWT.
The following properties are important in distinguishing the MODWT from the
DWT (Percival and Mofjeld 1997):

1. The MODWT can handle any sample size N , while the J th order partial DWT
restricts the sample size to a multiple of 2J .

2. The details and smooths of a MODWT multiresolution analysis are associated


with zero phase lters. This means events in the original time series may be
properly aligned with features in a multiresolution analysis.

3. The MODWT is invariant to circularly shifting the original time series. Hence,
shifting the time series by a given amount will simply shift the MODWT wavelet
and scaling coecients the same amount. This property simply does not hold
for the DWT.

4. While both the DWT and MODWT can perform an analysis of variance on
a time series, the MODWT wavelet variance estimator is asymptotically more
ecient than the same estimator based on the DWT (Percival 1995).

The transform goes by several names in the statistical and engineering literature, such
as, the stationary DWT and translation-invariant DWT see Percival and Mofjeld
(1997) for more details.

3.3.2 Denition
The brief introduction presented here follows from Percival and Mofjeld (1997). A
thorough discussion of the MODWT will appear in Percival and Walden (1999, Ch. 5).
35

The notation follows from the DWT, with the J th order partial MODWT being
de ned by W f=W fX, where W f is composed of J +1 length N vectors, W
f 1 : : :  W
fJ
and Ve J , which can be arranged in the following manner
f hf f f e iT
W  W1 W2 WJ VJ :
The vector of wavelet coecients W f j is associated with changes of length j = 2j;1
e J is associated with averages of lengths j and higher. For time series of dyadic
and V
length, the MODWT can be subsampled and rescaled to obtain the DWT.
Similar to the matrix W for the DWT, the matrix W f is also made up of J +1 sub-
matrices, each of them N N , and may be expressed as W f = hWf1 : : :  WfJ  VeJ iT .
In this case, when L = 4 and N > 4, we have
2 3
~h10 0 0 0 0 0 0 0 0 0 h~ 13 ~h12 ~h11
66 77
66 h11 h10 0 0 0 0 0 0 0 0 0 h13 h12 77
~ ~ ~ ~
66 ~h12 ~h11 h~ 10 0 0 0 0 0 0 0 0 0 ~h13 77
66 7
66 ~h13 ~h12 h~ 11 h~ 10 0 0 0 0 0 0 0 0 0 777
66 0 ~h13 h~ 12 h~ 11 ~h10 0 0 0 0 0 0 0 0 77
f1 = 66
W 7
66 0 0 h~ 13 h~ 12 ~h11 ~h10 0 0 0 0 0 0 0 777
66 ... ... ... ... ... ... . . . ... ... ... ... ... ... ... 77
66 77
66 0 0 0 0 0 0 0 h13 h12 h11 h10 0 0 77
~ ~ ~ ~
66 0 0 0 0 0 0 0 0 ~h13 h~ 12 h~ 11 ~h10 0 77
4 5
~
0 0 0 0 0 0 0 0 0 h13 h12 h11 h10 ~ ~ ~
(3.9)
where W f1 is a N N matrix, and the rows of the matrix h~1 = h1=21=2 are simply
the rescaled wavelet lter coecients circularly shifted by m ; 1 for m = 1 : : :  N .
In general, let h~j  hj =2j=2 and g~J  gJ =2J=2 be, respectively, the rescaled wavelet
and scaling lter coecients required to construct W f. The remaining sub-matrices
f2 : : :  WfJ are constructed similarly to Equation (3.9) and VeJ has the same struc-
W
ture as W fJ only using circularly shifted scaling coecients instead of the wavelet
36

coecients. Circular shifting for all scales is identical to that of Equation (3.9). In
practice, a pyramid scheme is utilized similar to that of the DWT see, e.g., Percival
and Walden (1999, Sec. 5.4).

3.3.3 Analysis of Variance


Percival and Mofjeld (1997) proved that the MODWT is an energy preserving trans-
form and, just as with the DWT, the total energy of a time series can be partitioned
using the MODWT wavelet and scaling coecients i.e.,
J 2 2
X
kXk = W
2 f j + Ve J :
j =1
This will allow us to construct MODWT versions of the wavelet variance (Section 3.4)
and the wavelet covariance and correlation (Chapter 5).

3.4 Wavelet Variance


3.4.1 Denition
The wavelet transform has been used to decompose the variance of physical processes
in many disciplines see, e.g., Bradshaw and Spies (1992), Hudgins, Friehe, and Mayer
(1993), and Wornell (1993). Percival (1995) investigated the concept of wavelet vari-
ance and showed that it decomposes the variance of a stationary process on a scale by
scale basis. He also showed its asymptotic normality, thus allowing for approximate
con dence intervals to be computed. We summarize some of his results and introduce
notation which will be useful when de ning the wavelet correlation in Section 5.1.2.
Let fXtg be a real valued Gaussian stationary process. The time independent
wavelet variance is de ned to be the variance of the wavelet coecients at scale j 
i.e.,
X2 (j )  21 VarfWjtg: (3.10)
j
37

Percival (1995) showed that the wavelet variance decomposes the variance of fXtg
on a scale by scale basis i.e.,
X
1
X2 (j ) = VarfXtg:
j =1

This is analogous to the spectral density function (Equation (B.2)) which decomposes
the variance of fXtg on a frequency by frequency basis.
We can form an unbiased estimator of the wavelet variance based upon the
MODWT using
1 NX;1 f2
~X2 (j ) e W  (3.11)
Nj l=L ;1 jt
j

where Nej = N ; Lj + 1 and Lj = (2j ; 1)(L ; 1) + 1. This can be seen by


  1 NX ;1 nf2 o 1  2  1
E ~2 ( ) = E Wjt = 2 E Wjt = 2 Var fWjtg 
Nej l=L ;1
X j
j
j j

which yields the result in Equation (3.10). The unbiased estimator based on the
DWT is given by
NX;1
^X2 (j )  b1 j

Wjt2  (3.12)
2j Nj l=L0 j

where Nj = N=2j , Nbj = Nj ; L0j and L0j = d(L ; 2)(1 ; 2;j )e. We utilize the wavelet
variance not only when de ning the wavelet correlation between two time series, but
also in Chapter 6 when analyzing time series from the physical sciences.

3.4.2 Equivalent Degrees of Freedom


The redundancy involved in the MODWT induces correlation in its wavelet coe-
cients. A useful concept to de ne is the equivalent degrees of freedom for a time series,
which is based on a 2 approximation to the distribution of (smoothed) periodogram
38

ordinates see, for example, Priestley (1981, pp. 466{468), Brillinger (1981, pp. 145{
146) and Percival and Walden (1993, Sec. 6.10). The equivalent degrees of freedom
concept has been used, for example, in estimating the statistical bandwidth of a time
series (Walden and White 1990) and establishing con dence intervals for the wavelet
variance based on the MODWT (Percival 1995).
Let fWfjtg be a vector of MODWT wavelet coecients associated with scale j for
a real valued Gaussian process fXt g whose dth order backward dierence is stationary,
and let L > 2d. We de ne fX (j )g to be the known autocovariance sequence of
fjtg. If we assume the MODWT estimator of wavelet variance ~X2 (j ) can be
fW
approximated via
~X2 (j ) =d bj 2j
(3.13)
d
(Rice 1945), where \=" means equal in distribution, we obtain
e 4
j = PNe ;1 Nj X (jj)j   (3.14)
j
 =;(Ne ;1) 1
j
; Ne  2 (j )
Xj

as an expression for the equivalent degrees of freedom of a vector of length Nej


MODWT wavelet coecients for scale j (analogous to Equation (B.4)) and
bj = X(j ) 
2
j
for the multiplicative constant. We can think of the equivalent degrees of freedom as
a measure of the \sample size" for a time series with unimodal spectrum.
For our purposes, we want to adjust hypothesis testing procedures that assume
uncorrelated observations. This is of interest to atmospheric scientists also see, e.g.,
Bretherton et al. (1998). Speci cally, substituting the equivalent degrees of freedom
for a vector of MODWT wavelet coecients for the true sample size may compensate
for their correlation structure and allow the use of critical values based on sequences
of independent random variables. This allows us to use the MODWT in testing
homogeneity of variance, on a scale by scale basis, in Section 4.4.1.
39

Table 3.2: Equivalent degrees of freedom for the MODWT of white noise (N = 512)
using the Haar, D(4), LA(8), and LA(16) wavelet lters. The numbers in parentheses
are Nej , the number of MODWT wavelet coecients unaected by the boundary
conditions.
Level
Wavelet 1 2 3 4 5 6
Haar 341:8 (511) 293:3 (509) 179:0 (505) 95:0 (497) 48:6 (481) 24:8 (449)
D(4) 312:6 (509) 244:1 (503) 129:9 (491) 65:6 (467) 33:2 (419) 16:9 (323)
LA(8) 293:9 (505) 204:0 (491) 103:1 (463) 51:9 (407) 26:3 (295) 13:5 (71)

For a particular scale j , the equivalent degrees of freedom j of the wavelet


lter can be computed by rst obtaining the autocovariance sequence fX (j )g.
Assuming a MODWT of Gaussian white noise with unit variance, this is simply the
inverse DFT of the squared gain function Hj ( ) of the wavelet lter (cf. Section A.3).
The number of non-zero terms in fX (j )g is quite small and can be determined
from the length of the original wavelet lter. Plugging fX (j )g into Equation (3.14)
gives the equivalent degrees of freedom for the wavelet lter associated with scale j .
Table 3.2 gives the equivalent degrees of freedom for the MODWT wavelet coecients,
using several wavelet lters, for scales j  j = 1 : : :  6. As the amount of correlation
increases at each scale of the MODWT, the autocovariance involves more and more
non-zero terms, forcing the equivalent degrees of freedom down.
A large sample approximation for j =Nej is possible by utilizing Cesaro summa-
bility (Titchmarsh 1939, p. 411) in the denominator of Equation (3.14) to yield, for
large Nej ,
j P X4 (j )
P 2 ( ) ;1
2 1 =1 X j
= 1 + :
Nej  =;1 X (j )
1 2 X (j )
4

If Lj 2, then the ratio must be smaller than 1 and thus, any application of the
40

Table 3.3: Large sample approximation to the ratio of equivalent degrees of freedom
j =Nej .

Level
Wavelet 1 2 3 4 5 6
Haar 0:6667 0:5714 0:3478 0:1839 0:0933 0:0468
D(4) 0:6095 0:4752 0:2521 0:1266 0:0633 0:0317
LA(8) 0:5730 0:3968 0:1999 0:0999 0:0500 0:0250

MODWT will result in a decrease in the eective sample size. Table 3.3 gives the large
sample approximation to the ratio j =Nej for MODWT wavelet coecients applied
to white noise. Comparing the results from Table 3.2 for a sample size of N = 512,
the large sample approximation gives 1 = 341:3 versus 341.8 (a dierence of 21 df)
for the unit scale Haar wavelet lter. The dierence between the two methods is less
than 1 df for all other scales and wavelet lters. We therefore recommend the large
sample approximation for moderate to large sample sizes in practice, keeping in mind
the estimate will be slightly conservative.
To see how well Equation (3.13) holds for MODWT wavelet coecients, a small
Monte Carlo study (100 iterations) was performed to simulate the MODWT wavelet
variance ~X2 (j ) and compare them to appropriate 2 distributions. Sequences of
Gaussian white noise (N = 512) were simulated, and a partial MODWT (J = 6),
using the Haar wavelet lter, was applied to them. Wavelet coecients aected by
the boundary were discarded and the variance for each scale calculated. Figure 3.3
shows quantile-quantile plots for the estimated variances against 2 distributions
with degrees of freedom given in the rst row of Table 3.2. Even for the higher scales,
where there are relatively few degrees of freedom involved, the MODWT estimates of
wavelet variance ~X2 (j ) appear to follow the approximation given in Equation (3.13).
41

Level: 5 Level: 6

80

40
70
60

30
50
40

20
30

30 40 50 60 70 10 20 30 40

Level: 3 Level: 4

140
Rescaled Wavelet Variance
240
220

120
200

100
180
160

80
140

140 160 180 200 220 80 100 120

Level: 1 Level: 2
360
340
400

320
300
350

280
260
300

240

280 300 320 340 360 380 400 240 260 280 300 320 340 360

Quantiles of the Chi-square Distribution

Figure 3.3: Quantile-quantile plots for the MODWT wavelet variance, using the Haar
wavelet lter, against a 2 distribution with degrees of freedom taken from the rst
row of Table 3.2. The wavelet variance has been adjusted at each scale in order to
more easily compare it with the 2 distribution.
42

The exact distribution of the MODWT wavelet variance ~X2 (j ) can be found using
the theory of quadratic forms of normal (Gaussian) random variables. Percival (1983,
Sec. 2.5) investigated this for the Allan variance (Allan 1966), which is proportional
to the wavelet variance using the Haar wavelet lter. Let us de ne
P fjl2
ej ~X2 (j ) Nl=;L1;1 W
N
Q  2 ( ) = 2 ( ) j
(3.15)
X j X j
to be the quadratic form of interest. Now, we may rewrite Equation (3.15) as
X
Nej

Q= iUi2
i=1
where fUi2g are independent 2 random variables with 1 degree of freedom each and
fig are the eigenvalues of the autocorrelation matrix B see, e.g., Johnson and Kotz
(1970, Ch. 29). The matrix B is band diagonal and computed by dividing the inverse
DFT of the squared gain function for scale j (i.e., the autocovariance sequence
fX (j )g) by the wavelet variance X2 (j ). To evaluate the distribution function of
Q we utilize the method of Imhof (1961), where the characteristic function of Q is
numerically inverted.
The exact distribution for Q was obtained from a combination of S-Plus and
FORTRAN code graciously provided by Professor R. Lockhart. We are not interested
in the distribution of Q per se, but instead are interested in the distribution of the
wavelet variance. This is easily obtained from the software using Equation (3.15) to
obtain
(e 2 ) ( )
N j ~X (j )
P fQ  qpg = P 2 ( )  qp = P ~X2 (j )  X e 2 (j )qp
= p
X j Nj
where qp is the pth quantile of the distribution of Q. For comparison, we note the
corresponding distribution of the wavelet variance assuming the equivalent degrees of
freedom argument (Equation (3.13)) is given by
P  ~X2 (j )  bj p = p

43

Level: 5 Level: 6
1.0

0.8

0.6

0.4

0.2

edof = 12.7 edof = 6.8


0.0

0.0 0.02 0.04 0.06 0.08 0.10 0.0 0.02 0.04 0.06 0.08

Level: 3 Level: 4
Cumulative Distribution Function

1.0

0.8

0.6

0.4

0.2

edof = 45.2 edof = 24.3


0.0

0.0 0.05 0.10 0.15 0.20 0.25 0.0 0.05 0.10 0.15

Level: 1 Level: 2
1.0

0.8

0.6

0.4

0.2

edof = 85.6 edof = 73.6


0.0

0.0 0.2 0.4 0.6 0.8 1.00.0 0.1 0.2 0.3 0.4 0.5

Wavelet Variance

Figure 3.4a: Cumulative distribution functions for the MODWT wavelet variance,
using the Haar wavelet lter. A sample size of N = 128 was used, with equivalent
degrees of freedom given in each panel. The solid line uses the exact method while
the dotted line uses the equivalent degrees of freedom method.
44

Level: 5
1.0

0.8

Exact
0.6
EDOF
0.4

0.2

edof = 16.9
0.0

0.0 0.02 0.04 0.06 0.08

Level: 3 Level: 4
Cumulative Distribution Function

1.0

0.8

0.6

0.4

0.2

edof = 65.2 edof = 33.1


0.0

0.0 0.05 0.10 0.15 0.20 0.25 0.0 0.05 0.10 0.15

Level: 1 Level: 2
1.0

0.8

0.6

0.4

0.2

edof = 156.3 edof = 122.2


0.0

0.0 0.2 0.4 0.6 0.8 1.00.0 0.1 0.2 0.3 0.4 0.5

Wavelet Variance

Figure 3.4b: Cumulative distribution functions for the MODWT wavelet variance,
using the D(4) wavelet lter. A sample size of N = 256 was used, with equivalent
degrees of freedom given in each panel. The solid line uses the exact method while
the dotted line uses the equivalent degrees of freedom method.
45

Exact
EDOF

Level: 3 Level: 4
Cumulative Distribution Function

1.0

0.8

0.6

0.4

0.2

edof = 51.8 edof = 26.2


0.0

0.0 0.05 0.10 0.15 0.20 0.20.0 0.05 0.10 0.15

Level: 1 Level: 2
1.0

0.8

0.6

0.4

0.2

edof = 146.9 edof = 102.2


0.0

0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4

Wavelet Variance

Figure 3.4c: Cumulative distribution functions for the MODWT wavelet variance,
using the LA(8) wavelet lter. A sample size of N = 256 was used, with equivalent
degrees of freedom given in each panel. The solid line uses the exact method while
the dotted line uses the equivalent degrees of freedom method.
46

where p is the pth quantile from a 2 distribution with j degrees of freedom. The
results for the Haar wavelet lter are given in Figure 3.4a with the D(4) and LA(8)
wavelet lters in Figures 3.4b{c, respectively. We see the distributions agree very well
for the smaller scales { the two curves are virtually on top of one another. However,
they begin to diverge slightly for higher scales (small equivalent degrees of freedom).
The software limited the maximum sample size that could be analyzed. This is the
reason for displaying fewer scales with respect to the D(4) and LA(8) wavelet lters.
Although the distributions based on the equivalent degrees of freedom argument
do not follow the true distribution for some scales, it is dicult to determine the
impact this would have when using the equivalent degrees of freedom argument to
modify hypothesis tests for homogeneity of variance. This point is explicitly inves-
tigated in Section 4.4.1, where the cumulative sum of squares test statistic formed
with MODWT wavelet coecients is adjusted using the equivalent degrees of freedom
on a scale by scale basis. Simulations are run to compare the empirical size of this
hypothesis testing procedure using asymptotic critical values from the DWT applied
to white noise.
Chapter 4
TESTING HOMOGENEITY OF VARIANCE
Suppose we have a time series that we are considering to model as a realization
of one portion Y1 : : :  YN of a stationary Gaussian fractional dierence process fYtg
de ned by Equation (2.3). An important assumption behind any stationary process
is that its variance is a constant independent of the time index t. In the context of
short memory models, such as stationary autoregressive and moving average (ARMA)
processes, a number of tests have been proposed for homogeneity of variance. For
a time series consisting of either independent Gaussian random variables with zero
mean and possibly time-dependent variances t2 or a moving average of such variables,
Nuri and Herbst (1969) proposed to test the hypothesis that t2 is constant for all t
by using the periodogram of the squared random variables. Wichern, Miller, and Hsu
(1976) proposed a moving block procedure for detecting a single change of variance
at an unknown time point in an autoregressive model of order one. Hsu (1977, 1979)
studied the detection of a variance shift at a single unknown point in a sequence of
independent observations. Davis (1979) studied tests for a shift in the innovation
variance of an autoregressive process at a speci ed point. Abraham and Wei (1984)
used a Bayesian framework to study changes in the innovation variance of an ARMA
process. Tsay (1988) looked at detecting several types of disturbances in time series
{ among them variance changes { by analyzing the residuals from tting an ARMA
model. Srivastava (1993) found the cumulative sum of squares procedure to perform
better than the exponentially weighted moving average procedure for detecting an
increase in variance in white noise sequences. Inclan and Tiao (1994) investigated
the detection of multiple changes of variance in sequences of independent Gaussian
48

random variables by recursively applying a cumulative sum of squares test to pieces of


the original series. Using a similar recursive scheme, Chen and Gupta (1997) applied
an information criterion test to the problem of multiple variance changes. None of
the above tests have been adapted to work with long memory processes.
The DWT has already proven useful for investigating other types of nonstationary
events. For example, Wang (1995) tested wavelet coecients at ne scales to detect
jumps and sharp cusps of signals embedded in Gaussian white noise, and Ogden and
Parzen (1996) used wavelet coecients to develop a data-dependent threshold for
removing noise from a signal. The key property of the DWT that makes it useful for
studying possible nonstationarities is that it transforms a time series into coecients
that reect changes at various scales and at particular times. For fractional dierence
and related long memory processes, the DWT wavelet coecients for a given scale are
approximately uncorrelated see Section 4.1. We show here that this approximation
is good enough that a test designed for a null hypothesis of white noise can be used
for testing homogeneity of variance in a long memory process on a scale by scale
basis. An additional advantage of testing the output from the DWT is that the
scale at which the inhomogeneity occurs can be identi ed. Using a variation of the
DWT, the maximal overlap discrete wavelet transform (MODWT) (Section 3.3), we
also investigate an auxiliary test statistic that can estimate the time at which the
variance of a time series changes on a particular scale.
Here, we demonstrate how the DWT can be used to construct a test for homo-
geneity of variance in a time series exhibiting long memory characteristics. We begin
by performing a spectral analysis of DWT wavelet coecients, when applied to both
short and long memory processes, in order to establish the approximate decorrelation
property required for testing the statistical hypothesis of homogeneity of variance. We
then introduce the normalized cumulative sum of squares test statistic. Its asymptotic
and small sample distribution, along with its relationship to data analytic threshold-
ing (Ogden and Parzen 1996), are investigated. Empirical size and power calculations
49

are presented when detecting single and multiple variance change points. The ability
of the cumulative sum of squares test statistic to detect a change in the long memory
parameter of a fractional dierence process is briey investigated. Applications can
be found in Section 6.1, where we analyze the annual minimum water levels of the
Nile River, and Section 6.2, where we investigate a series of measurements related to
vertical shear in the ocean.

4.1 Spectral Analysis of DWT Wavelet Coecients


4.1.1 Long Memory Processes
The ability of the wavelet transform to decorrelate time series, such as fractional
dierence processes, producing DWT wavelet coecients for a given scale which are
approximately uncorrelated is well known see, e.g., Tew k and Kim (1992), McCoy
and Walden (1996) and Wornell (1996). Here, we explore the output of the DWT
when applied to a fractional dierence process. Emphasis will be on the spectral
properties of the DWT wavelet coecients, instead of looking at the pairwise covari-
ances between coecients as in the references given above. Some information and
notational conventions used here are taken from Percival and Walden (1999).
Let fXt g be a fractional dierence process with spectrum SX (f ) = j2 sin( f )j;2d,
for jf j  21 (cf. Equation (2.3)). Let us restrict ourselves to looking at processes
with dierence parameters 0 < d < 21  i.e., stationary long memory processes. We
know the vector of DWT wavelet coecients W1 for unit scale is simply the original
time series convolved with the wavelet lter h1 and subsampled by two. Now, the
spectrum of a subsampled process, say Yt = X2t, is known to be
;  ;
S 1 f + SX 21 f + 12

SY (f )  X 2 2 (4.1)

(Anderson 1971, p. 388), and from Section A.2, we know that ltering a time series
corresponds to a multiplication of its spectrum with the squared gain function of the
50

lter. Hence, the ltered coecients have spectrum H1(f )S (f ), and


H ;1  ;1  ;1 1 ;1 1
1 2 f S 2 f + H1 2 f + 2 S 2 f + 2
S (f ) 
1
2 (4.2)

is the spectrum of the DWT wavelet coecients for unit scale. For the class of
Daubechies wavelets, we have a closed form expression for H1( ), namely,
X L=2 ; 1 + l 2l
L=2;1
H (D)
1 (f )  2 sinL( f ) l cos ( f ) = D (f )C (f )L
2 (4.3)
l=0
(Daubechies 1992, Ch. 6.1), where D(f )  j2 sin( f )j2 corresponds to a rst order
backward dierence lter and
1 L=X2;1 
L=2 ; 1 + l cos2l( f ):
C (f )  2L;1 l
l=0
Substituting Equations (2.3) and (4.3) into Equation (4.2) gives us
D L ;  ;  ;  ;
1f C 1f S 1f + D 1f + 1 C 1f + 1 S 1f + 1
L  ;  ; 
S1(D)(f )
2 2
= 2 2 2 2 2 2 2 2 2
;  ;  2 ;
D ; ( d; ) 1
L
2 f C 1 f + D ;( d; ) 1 L
2f + 1 C ;1f + 1
= 2 2 2 2 2 2 :
2
The rst lter D;(d; 2 )( ) corresponds to a fractional dierence process with dierence
L

parameter d ; L2 , and the second lter C ( ) has compact support.


Let us look into the characteristics of the spectrum of DWT wavelet coecients at
unit scale for the Haar wavelet. It is relatively simple to calculate, since H(Haar)
1 (f ) =
2 sin2( f ). The spectrum of the ltered fractional dierence process fh1
Xtg is
therefore

H(Haar)
1 (f )S (f ) = 21 j2 sin( f )j;2(d;1):

That is, the spectrum of the ltered process is proportional to a fractional dierence
process with parameter d0 = d ; 1. Since we were looking at so called \red" processes
with 0 < d < 21 , this means ;1 < d0 < ; 12 and hence the ltered process is \blue."
51

The colorful terminology comes from optics, where low frequencies of light are seen as
red and high frequencies seen as blue. The spectrum of the DWT wavelet coecients
using the Haar wavelet is
 
1 2 sin  f ;2(d;1) + 2 cos  f ;2(d;1) :
S1(Haar)(f ) = 4  2   2 
When d = 0, fXtg is a white noise process and the spectrum of the DWT wavelet
coecients for unit scale is constant { as to be expected. Figure 4.1 shows the the-
oretical spectra for the unit scale DWT coecients of fractional dierence processes
with ; 21  d  12 . As the long memory parameter approaches 0.5 or ;0:5, the
wavelet coecients have a greater amount of correlation. However, the range of the
vertical axis in each plot is only from ;3 to 3 decibels, and the variation for any
particular spectrum is less than this range, so the spectrum for any choice of long
memory parameter does not have much structure beyond that of white noise.
The formula given in Equation (4.2) can be extended to an arbitrary scale j
using the notion of a cascading lter (cf. Section A.3). The rst step is to separate
H(jD)( ) into pieces using an equivalent formula to Equation (3.6) for squared gain
functions, namely,

H(jD)(f ) = H(1D)(2j;1 f )Gj(;D1)(f ) = D (2j;1 f )C (2j;1 f )Gj(;D1)(f )


L
2 (4.4)

where
X L=2 ; 1 + l 2l
L=2;1
G (D)
1 (f )  2 cosL( f ) sin ( f )
l
l=0
is the squared gain function for the Daubechies scaling coecients. Using the trigono-
metric identity, sin2(2f ) = 4 sin2(f ) cos2(f ), we can re-express the rst term of Equa-
tion (4.4) via
Y
j ;2
D(2j;1 ) = D(f ) 4 cos2( 2k f ):
k=0
52

3
LA(8)

dB

frequency
0
D(4) Haar

-1

dB dB

-2

d d

frequency frequency
-3

Figure 4.1: Theoretical spectra for the unit scale DWT wavelet coecients of frac-
tional dierence processes. The z-axis ranges from ;3 to 3 dB and, for ease of viewing,
the x and y-axes have been reversed { the long memory parameter d goes from ;0:5
to 0:5 and the frequency goes from 0 to 0:5 in the direction of the arrow.
53

Since we are downsampling by 2 at each level of the transform, the spectrum for a
vector of DWT wavelet coecients Wj associated with scale j is
X H(jD) ; 21 f + 2k  S ; 21 f + 2k 
2j ;1
Sj(D)(f ) = j j

2j
j j

k=0
where
"Y
j ;2 # L
2

H (D)
j (f ) = D (f )
L
2 4 cos2( 2k f ) C (2j;1 f )Gj(;D1)(f ): (4.5)
k=0
That is, the spectrum is stretched by 2j , and then 2j ; 1 aliased versions are added
to it (Vetterli and Kova%cevic 1995, p. 66). This can intuitively be seen through
successive applications of Equation (4.1) to the ltered spectrum.

Table 4.1: Maximum dynamic range for the spectra of DWT wavelet coecients,
in decibels (dB), when applied to fractional dierence processes with long memory
parameter d.

Haar D(4) LA(8)


Level d 2 #; 12  0] d 2 #0 21 ] d 2 #; 12  0] d 2 #0 21 ] d 2 #; 12  0] d 2 #0 12 ]
1 1:43 1:48 1:40 1:56 1:36 1:59
2 1:83 2:26 2:28 2:56 2:51 2:71
3 2:07 2:71 2:64 2:93 2:81 3:00
4 2:21 2:87 2:76 3:07 2:80 3:14
5 2:33 2:93 2:84 3:10 2:82 3:16
6 2:42 2:95 2:88 3:10 2:83 3:16

To summarize the information contained in Figure 4.1, a useful measure to intro-


duce is the dynamic range, de ned to be
maxf S (f )

10 log10 min S (f )
f
54

(Percival and Walden 1993, p. 201). Table 4.1 gives the maximum dynamic ranges,
in dB, for the spectra of DWT coecients applied to fractional dierence processes
with long memory parameter ; 21  d  12 . As the level of the DWT increases, where
more and more energy is present for red processes, the dynamic range of the spectra
is negligible and appears to level o around 3 dB regardless of wavelet lter. This
lack of dynamic range, which corresponds to almost uncorrelated observations in the
original process, is utilized in the next chapter to test for nonstationary events in the
presence of long memory structure.

4.1.2 Short Memory Processes

The frequency-domain analysis of Daubechies wavelet coecients from long memory


processes (speci cally, fractional dierence processes) is relatively straightforward
since their spectra are of a similar form. It is also useful to look at the behavior
of these wavelet coecients from short-memory processes. We focus on two simple
processes, the moving average process of order 1 (MA(1)) and autoregressive process
of order 1 (AR(1)). The spectra of these two processes are given by

S (MA)(f ) = 2 1 ; e;i2f 2 = 2 1 ; 2 cos(2 f ) + 2


S (AR)(f ) = 2 1 ; e;i2f ;2 = 2 1 ; 2 cos(2 f ) + 2;1 

respectively (Percival and Walden 1993, pp. 392,443).


Let us rst look at the Haar wavelet lter. The spectrum of the unit scale DWT
wavelet coecients ( 2 = 1) applied to the MA(1) process is

f 1 ; 2 cos( f ) + 2
S1(Haar,MA)(f ) = sin2 2
f
 
+ cos 2 1 ; 2 sin( f ) + 2 
2
55

Table 4.2: Maximum dynamic range for the spectra of DWT wavelet coecients, in
decibels (dB), when applied to AR(1) and MA(1) processes with parameters  and
, respectively.

Haar D(4) LA(8)


Level  2 (;1 0]  2 #0 1)  2 (;1 0]  2 #0 1)  2 (;1 0]  2 #0 1)
1 42:85 2:91 42:83 3:03 42:80 3:07
2 0:66 4:64 1:35 5:19 1:96 5:42
3 0:75 5:38 0:95 5:89 0:55 5:98
4 0:77 5:05 0:39 6:04 0:12 6:11
5 0:73 5:71 0:08 6:06 0:17 6:11
6 0:74 5:85 0:10 5:97 0:17 6:02
Haar D(4) LA(8)
Level  2 (;1 0]  2 #0 1)  2 (;1 0]  2 #0 1)  2 (;1 0]  2 #0 1)
1 34:54 2:90 42:98 2:88 43:03 2:84
2 1:84 3:01 2:23 4:19 2:47 4:92
3 0:73 3:00 0:73 4:70 0:71 5:50
4 0:35 3:00 0:30 4:99 0:31 5:72
5 0:19 3:01 0:16 5:17 0:20 5:77
6 0:12 3:00 0:12 5:26 0:18 5:72

and for the AR(1) process



f 1 ; 2 cos( f ) + 2;1
S1(Haar,AR)(f )= sin2 2
f
 
+ cos 2 1 ; 2 sin( f ) + 2 ;1 :
2

When  = 0 for the MA(1) process, or  = 0 for the AR(1) process, we see that
the spectra equal 1 for all frequencies. This is to be expected since the processes
56

themselves are simply white noise.


Table 4.2 gives the maximum dynamic ranges for the spectra of DWT wavelet
coecients applied to short-memory processes. Dynamic ranges are similar to those
encountered for fractional dierence processes, except for the rst scale. As  ! ;1
and  ! ;1, the DWT wavelet coecient spectra appear to asymptote. This is easy
to explain, since the spectra of these AR(1) and MA(1) processes have a large spike
of energy around f = 0:5. Whereas fractional dierence processes are limited in the
amount of energy at higher frequencies, these short memory processes are not. The
DWT partitions the frequency plane in octave bands and therefore captures most
of the spectral energy in the rst scale. This may appear to be a fatal aw in the
decorrelating properties of the DWT, but in the following section I discuss how to
adapt the transform to overcome this problem in practice.
Figures 4.2 and 4.3 show the spectra of unit scale DWT wavelet coecients of
AR(1) and MA(1) processes, respectively. As already noted in Table 4.2, the spectra
are relatively featureless except where  ! ;1 and  ! ;1. At rst glance, the
location of the asymptote may seem counter-intuitive. The reason it is located at
f = 0 is because of the folding of the spectrum through subsampling to obtain the
DWT wavelet coecients (cf. Equation (4.1)).

4.1.3 Conclusions
The DWT has been shown, through spectral theory, to approximately decorrelate
both short-memory and long memory processes. As seen from in Figures 4.2 and 4.3,
this attribute appears to fail when the process asymptotes in the high frequency range,
say f = 0:5, instead of in the low frequency range. If this occurs in practice, there are
at least two simple ways to overcome this problem. First, a signal processing trick
of multiplying every other value of the time series by ;1 will reverse the spectrum
of the original series. A large amount of energy in high frequencies will therefore be
shifted into the lower frequencies { where the DWT has been show to approximately
57

40
LA(8)

35

dB 30

25


20

frequency
D(4) Haar
15

10

dB dB
5

frequency frequency -5

Figure 4.2: Theoretical spectra for the unit scale DWT wavelet coecients of an
AR(1) process. The z-axis ranges from ;5 to 40 dB and, for ease of viewing, the x-
and y-axes have been reversed { the parameter  goes from ;0:99 to 0:99 and the
frequency goes from 0 to 0:5 in the direction of the arrow.
58

LA(8) 5

dB
-5

-10

-15

frequency
D(4) Haar

-20

-25

dB dB

-30

-35

frequency frequency
-40

Figure 4.3: Theoretical spectra for the unit scale DWT wavelet coecients of an
MA(1) process. The z-axis ranges from ;5 to 40 dB and, for ease of viewing, the x-
and y-axes have been reversed { the parameter  goes from ;0:99 to 0:99 and the
frequency goes from 0 to 0:5 in the direction of the arrow.
59

decorrelate. Instead of preprocessing the time series, a slight adjustment to the


wavelet transform has a similar result. Namely, switch the wavelet and scaling lters
when performing the DWT. Thus, lower frequency ranges will be ltered out and the
ne partitioning of the frequency plane will occur as f ! 0:5.
If the spectrum of the underlying process asymptotes at a frequency between 0 and
0.5, a generalization of the DWT { the discrete wavelet packet transform (DWPT) {
may be used to construct an orthonormal transform of the data with an alternative
frequency tiling see, e.g., Wickerhauser (1994) or Percival and Walden (1999, Ch. 9)
for more information on the DWPT. Such a process is a generalization of a fractional
dierence process see, e.g., Hosking (1981) and Giraitis and Leipus (1995).

4.2 Normalized Cumulative Sum of Squares Test Statistic


4.2.1 Denition
Let X1 X2 : : :  XN be a sequence of independent Gaussian (normal) random vari-
ables with zero means and variances 12 22 : : :  N2 . We would like to test the hy-
pothesis
H0 : 2 2
1= 2= = N2 : (4.6)
A test statistic that can discriminate between this null hypothesis and a variety of
alternative hypotheses (such as H1 : 12 = = k2 6= k2+1 = = N2 , where k is an
unknown change point) is the normalized cumulative sums of squares test statistic
D, de ned as follows. Let
Pk X 2
Pk  PNj=1 j2  k = 1 : : :  N ; 1 (4.7)
X
j =1 j
and de ne D  max(D+  D; ), where
k
 k ; 1

D  1max
+
kN ;1 N ; 1
; Pk and D  1max
;
kN ;1
Pk ; N ; 1 : (4.8)
60

Percentage points for the distribution of D under the null hypothesis can be readily
obtained through Monte Carlo simulations. When N = 2,
8
>
>
< 0  x < 21 
PfD  xg = > P 1 ; x  B ; 12  12   x  12  x < 1 (4.9)
>
: 1 x 1
; 
where B 1  1 is a beta random variable with parameters 1 and 1 . The proof of this
2 2 2 2
is straightforward. Let X1 and X2 be two independent Gaussian random variables
with zero means and common variance (under H0), then
P1 = X 2X+1 X 2 and P2 = 1:
2
1 2
The random variable P1 has a beta distribution with parameters 21 and 21 . Now, the
preliminary statistics are given by D; = P1 and D+ = 1 ; P1, therefore
PfD  xg = Pfmax(P1 1 ; P1)  xg
= PfP1  x 1 ; P1  xg
= Pf1 ; x  P1  xg
and Equation 4.9 follows directly.
There is no known tractable closed form expression for PfD  xg with arbitrary
N . Hsu (1977) commented on this fact and used two methods to obtain small sam-
ple critical values Edgeworth expansions and tting the rst three moments of his
statistic, which is equivalent to D, to a one-parameter beta distribution. Inclan and
Tiao (1994) proved that, for large N and x > 0,
(r )  
P N D  x P sup W 0  x = 1 + 2 X1
( ;1) l e;2l2 x2 
2 t
t l=1
where Wt0 is a Brownian bridge process, and the right-hand expression is Equa-
tion (11.39) of Billingsley (1968). Table 4.3 shows how quickly the Monte Carlo
critical values converge to the quantiles of the Brownian bridge process.
61

Table 4.3: Monte Carlo critical values for the test statistic (N=2) 12 D, using the Haar
wavelet lter, for a level  test. These values are based upon 10,000 replicates.
The standard error (SE) is provided for each estimate, and was computed via SE =
f(1 ; )=(10 000f 2 )g 21 where f is the histogram estimate of the density at the
(1 ; )th quantile using a bandwidth of 0.01 (Inclan and Tiao 1994). Quantiles of a
Brownian bridge process are given at the far right for comparison.
Sample size
 8 16 32 64 128 256 512 1024 1
0.10 1:109 1:135 1:157 1:182 1:193 1:197 1:206 1:209 1:224
SE 0:003 0:003 0:003 0:003 0:003 0:003 0:003 0:003
0.05 1:232 1:265 1:293 1:313 1:326 1:329 1:345 1:341 1:358
SE 0:004 0:004 0:004 0:004 0:004 0:004 0:004 0:004
0.01 1:459 1:508 1:553 1:584 1:596 1:596 1:630 1:617 1:628
SE 0:007 0:008 0:008 0:009 0:008 0:010 0:008 0:007

4.2.2 Data Analytic Thresholding


The idea of using a Kolmogorov-type statistic for wavelet coecients has been pro-
posed previously by Ogden and Parzen (1996). They were interested in applying
change-point methods to the problem of wavelet thresholding. Their result is a
threshold which is determined by the data being analyzed on a scale by scale ba-
sis.
As in Section 4.2.1, let X1 : : :  XN be a sequence of independent Gaussian random
variables. The procedure is based on a sample Brownian bridge process
8 1 1 PbNtc bNtc Pt
 1
<
cN0 (t)  N  i=1 g(Xi ) ; N i=1 g(Xi ) N  t  1
p
W :0
g
(4.10)
0  t < N1 
where g( ) is a nonlinear function, 2
g  Var fg(Xi)g and bxc is the greatest integer
62

less than or equal to x. The process fW cN0 (t) j 0  t  1g converges in distribution


to a Brownian bridge process.
In practice, their thresholding technique is implemented on a scale by scale basis
to sequences of wavelet coecients as follows:
#1] Form the sample Brownian bridge process (Equation (4.10)) of the wavelet
coecients and test against the null distribution see, e.g., Table 4.2 in Stephens
(1986) for appropriate critical values.

#2] If the null hypothesis is rejected, remove the wavelet coecient with the greatest
absolute value, reduce t to t ; 1, and return to #1].

#3] If the null hypothesis is not rejected, set the threshold equal to the absolute
value of the largest (in absolute magnitude) wavelet coecient.
Although several transformations and empirical distribution function tests were avail-
able, Ogden and Parzen (1996) used the transformation g(x) = x2 in Equation (4.10)
and the Kolmogorov{Smirnov test statistic.
It should be pointed out, g will not be known in practice and hence must be
estimated from the data. This problem was addressed in Ogden (1994, Sec. 5.5)
by recommending the median absolute deviation of the nest level of wavelet coef-
cients as in Donoho and Johnstone (1994). By formulating the test statistic as in
p
Equation (4.8), the g term is no longer involved it being replaced by 2 which is
scale independent. This is seen through the following argument. Let Y1 : : :  YN be
a sequence of j th level wavelet coecients, from the DWT, obtained from a white
noise process (let N be dyadic for simplicity). Hence, the wavelet coecients are
also distributed as white noise. Let g2  Var fY12g and g  E fY12g. We de ne the
statistic
p ( PbkNt Y
c 2
b Ntc
)
VN (t)  N PN=1 k2 ; N  0  t  1
k=1 Yk
63

where VN ( ) is related to D via


r
p sup jVN (t)j N2 D for large N
1
2 t
and de ne
8bNtc 9
< X c X Y 2 =  0  t  1
N
UN (t)  p 1 : Yk2 ; bNt N k=1 k
N g k=1
P
to be Ogden's Brownian bridge statistic, where 0k=1 Yk2  0, then
p 8 <
9
X
bNtc
VN (t) = PN N 2 : Yk2 ; bNt c X Y 2=
N

k=1 Yk k=1 N k=1 k


8bNtc 9
< X c Y 2=
XN
= p1 g 1 PN1 2 : Yk2 ; bNt N k
N g N Y k=1 k k=1 k=1
= 1 PNg Y 2 UN (t)
N k=1 k
g
! g U  V as N ! 1
where U is a Brownian bridge process. Since the squared wavelet coecients are
distributed as 2 21 random variables,
q p  
g = Var fY12 g = 2 4 and g = E Y12 = 2
and hence
p
g
=
2 2 = p2:
g 2

When looking at the boundary crossing probability for a Brownian bridge process,
p
the asymptotic critical values for V are 2 times the asymptotic critical values for
U. Thus, we do not need to estimate the variance of the squared wavelet coecients
when testing for homogeneity of variance. The relationship between Equations (4.8)
and (4.10) is similar to the one between the test of periodogram ordinates by Schuster
(1898) and Fisher's g-statistic (Fisher 1929), where standardizing by the sum of
squares eliminates having to know the variance of the time series.
64

4.3 Testing Procedure


We present the steps for two testing procedures: the cumulative sum of squares test
using the DWT and the cumulative sum of squares test using the MODWT.
The procedure presented here is for the DWT-based cumulative sum of squares
test statistic using Monte Carlo critical values. It
#1] generates a realization of length N from a fractional dierence process with a
speci ed parameter d

#2] computes the partial DWT of order J , de ned in Section 3.2, using the Haar,
D(4) and LA(8) wavelet lters

#3] discards all coecients on each scale that make explicit use of the periodic
boundary conditions

#4] computes the test statistic D for all scales based upon the remaining wavelet
coecients and
1
#5] rejects the null hypothesis if (N=2) 2 D is greater than the Monte Carlo white
noise critical levels.
A slight modi cation may be made for the DWT-based procedure for large sample
sizes, speci cally, using asymptotic critical values instead of one obtained through
Monte Carlo experiments (cf. Table 4.3).
The MODWT-based cumulative sum of squares procedure, using Monte Carlo
critical values, is similar to the DWT-based procedure already de ned { simply sub-
stitute the MODWT for the DWT. Asymptotic critical values are not available since
the MODWT wavelet coecients are correlated. However, a slight modi cation may
be made by substituting the equivalent degrees of freedom j (cf. Section 3.4.2)
instead of the sample size Nej when testing the statistic D. If we believe that the
65

equivalent degrees of freedom adequately captures the inherent correlation in the


wavelet coecients, then asymptotic critical values based on the DWT (which as-
sume uncorrelated wavelet coecients) may be used. Thus, we are freed from Monte
Carlo experiments in order to test arbitrary sample sizes.

4.4 Testing for a Single Variance Change


4.4.1 Empirical Size
To study if in fact the DWT of a fractional dierence process behaves as if each
sub-series were white noise, as far as the test statistic D is concerned, Monte Carlo
methods were employed. We determined the upper 10%, 5% and 1% quantiles for
the distribution of D based upon 40,000 realizations of white noise for a range of
sample sizes commensurate with time series of sample sizes N = 128 256 512 1024
and 2048. Using these quantiles, we then employed the testing procedure outlined
in Section 4.3, of order J = 4, repeated 10,000 times each for d = 0:05 0:25 0:4
and 0.45. The percentages of times that D exceeded the white noise critical levels are
recorded in Figure 4.4. We see that the percentages are quite close to the rejection
rates established from white noise, so the assumption that the DWT decorrelates
long memory processes is a good one for the purposes of evaluating D. We can
thus conduct an approximate  level test for variance homogeneity of a fractional
dierence process on a scale by scale basis by simply using critical levels determined
under the assumption of white noise.
Figure 4.4 indicates that the Haar wavelet performs as well as the D(4) and
LA(8) wavelets for a fractional dierence process. These later wavelets are more
appropriate for nonstationary processes with stationary dierences because the extra
implicit dierencing operations ensure that the wavelet coecients form a stationary
process with zero mean, which will not necessarily be true for the Haar wavelet. For
a fractional dierence process, the simple form of the Haar wavelet makes it possible
66
256 512 1024 256 512 1024

Level: 4 Level: 4 Level: 4 Level: 4


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

1
Level: 3 Level: 3 Level: 3 Level: 3
d: 0.05 d: 0.25 d: 0.40 d: 0.45

10
Rejection Rate (%)

1
Level: 2 Level: 2 Level: 2 Level: 2
d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

1
Level: 1 Level: 1 Level: 1 Level: 1
d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

256 512 1024 256 512 1024

Sample Size

Figure 4.4: Rejection rates for fractional dierence processes using white noise critical
levels, N = 128, 256, 512, 1024 and 2048. The solid line is the Haar wavelet lter,
the dotted line is the D(4) and the dashed line is the LA(8).
67
256 512 1024 256 512 1024

Level: 4 Level: 4 Level: 4 Level: 4


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

Level: 3 Level: 3 Level: 3 Level: 3


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

5
Rejection Rate (%)

Level: 2 Level: 2 Level: 2 Level: 2


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

Level: 1 Level: 1 Level: 1 Level: 1


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

256 512 1024 256 512 1024

Sample Size

Figure 4.5: Rejection rates for fractional dierence processes using asymptotic critical
levels, N = 128, 256, 512, 1024 and 2048. The solid line is the Haar wavelet lter,
the dotted line is the D(4) and the dashed line is the LA(8).
68

to obtain analytical expressions for the quantiles of D, a subject for future study.
One may not want to perform Monte Carlo studies in order to obtain critical
values for the test statistic D. The simulation study described above was run again
substituting the asymptotic critical values (last column of Table 4.1) for the Monte
Carlo critical values. The results are given in Figure 4.5, using a similar vertical axis
to Figure 4.4 for comparison. The percentage of times D exceeded the asymptotic
critical levels was within 10% of the theoretical quantile when there were at least
128 wavelet coecients. The Haar wavelet lter was found to be conservative for
all sample sizes, that is, the percentage of times D exceeded the asymptotic critical
levels was below the theoretical quantile. Hence, using asymptotic critical values will
give reasonable results, if Monte Carlo critical values are unavailable, when the at
least 128 wavelet coecients are present.
To investigate how well this approximation performs for large sample sizes, the
procedure from Section 4.3 was performed for fractional dierence processes of length
N = 215 = 32 768 using a partial DWT of order J = 8. Due to the computational
time involved, the procedure was only repeated 1000 times. The percentages of times
that D exceeded the white noise critical levels under these conditions were found
to be quite close to the rejection rates established from asymptotic critical values
with increased variability due to the reduced number of iterations in the Monte Carlo
study. Thus, all the simulation studies we have conducted to date indicate that the
DWT adequately decorrelates long memory processes for the purpose of using the
test statistics D.
Although the MODWT of a fractional dierence process exhibits correlation be-
tween wavelet coecients, it does retain a greater number of coecients per scale.
This may be a useful attribute for testing a wider range of alternative hypotheses,
not just a sudden change of variance. To examine the cumulative sum of squares
procedure using the MODWT, a similar investigation to the DWT was performed.
The correlation structure of MODWT wavelet coecients invalidates our ability to
69

use an asymptotic distribution (like a Brownian bridge process for the DWT) when
testing for homogeneity of variance. Although Monte Carlo techniques are relatively
easy to implement, they depend on the sample size. When repeatedly testing un-
der unknown sample sizes, e.g., testing multiple variance changes in Section 4.6, this
requires considerable computing time.
One possible approach is to compensate for the correlation structure by modifying
the test statistic D, computed via the MODWT, using the equivalent degrees of free-
dom. The equivalent degrees of freedom argument was investigated in Section 3.4.2
with respect to the wavelet variance. The distribution under this argument was not
found to dier too drastically from the true distribution of the MODWT estimator of
the wavelet variance. Here, instead of testing (N=2) 12 D we propose to use (=2) 21 D,
where  is the equivalent degrees of freedom given by Equation (3.14). By doing
so, we attempt to obviate the need for determining critical levels via Monte Carlo
experiments.
To investigate this test, we followed the procedure outlined in Section 4.3, of order
J = 4, repeated 10,000 times each for d = 0:05 0:25 0:4 and 0.45. The percentages of
times that (=2) 21 D exceeded the asymptotic critical levels are recorded in Figure 4.6.
We see that the percentages are quite close to the nominal rejection rates, when the
long memory parameter is smaller, and the percentages are more and more conserva-
tive as d approaches 0.5. Between the three wavelet lters, the LA(8) appears to give
the most consistent rejection rates across all sample sizes and long memory param-
eters. The D(4) also performs reasonably well, but is quite a bit more conservative
when compared with the LA(8) wavelet lter { especially as the number of wavelet
coecients decreases. This problem has already been noted when using the DWT,
that is, when using asymptotic critical values all wavelet lters suer as the number
of wavelet coecients decreases.
The equivalent degrees of freedom adjustment is crude, however, by using it the
MODWT may be used to conduct an approximate  level test for variance homo-
70
256 512 1024 256 512 1024

Level: 4 Level: 4 Level: 4 Level: 4


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

Level: 3 Level: 3 Level: 3 Level: 3


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

5
Rejection Rate (%)

Level: 2 Level: 2 Level: 2 Level: 2


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

Level: 1 Level: 1 Level: 1 Level: 1


d: 0.05 d: 0.25 d: 0.40 d: 0.45

10

256 512 1024 256 512 1024

Sample Size

Figure 4.6: Rejection rates for fractional dierence processes using the MODWT
and asymptotic critical values, adjusted using equivalent degrees of freedom (N =
128 256 512 1024 2048). The solid line is the Haar wavelet lter, the dotted line is
the D(4) and the dashed line is the LA(8).
71

geneity of a fractional dierence process on a scale by scale basis when there are at
least 64 wavelet coecients.

4.4.2 Empirical Power


With the empirical size of the proposed test established, we can look at how powerful
the test is to particular alternatives. The procedures outlined in Section 4.3, of
order J = 4, were repeated for a speci c sample size N = 656 and long memory
parameter d = 0:40 (this particular sample size and long memory parameter mimic
the attributes of the Nile River minimum water levels introduced in Section 6.1). One
modi cation was made to Step #1], namely, adding a vector of independent Gaussian
random variables to the rst 100 observations of the fractional dierence process.
Instead of adjusting the long memory parameter, the variance ratio between the rst
100 and remaining observations was adjusted { producing variance ratios of " = 1:5,
2, 3 and 4.
Table 4.4 gives the rejection rates, at the  = 0:05 level of signi cance, for the
cumulative sum of squares (CSS) method when applied to fractional dierence pro-
cesses (N = 656 d = 0:40) with one variance change at k = 100. The parameter ' is
de ned to be the ratio of variances within each octave band between the fractional
dierence process with noise and without. For a given ", the variance ratio ' de-
creases as the level increases. This is reasonable, since less and less of the spectrum
is being captured as the level of the DWT increases so the inuence of adding a
constant to all frequencies will gradually diminish.
For all variance ratios the CSS method gives reasonable rejection rates. While
an increase in variance appears to aect primarily the rst two levels of the DWT
wavelet coecients, as seen by the high percentage of rejections, as the magnitude
of the variance change increases higher levels of wavelet coecients are also aected.
This is important to consider when deciding between a change in variance and a
change in the long memory parameter of a time series (see Section 4.7 for further
72

Table 4.4: Performance of cumulative sums of squares (CSS) method for fractional
dierence processes (N = 656 d = 0:40) with one change of variance at k = 100.
All tests were performed at the  = 0:05 level of signi cance. The parameter "
indicates the variance ratio between the rst 100 and remaining observations, and
the parameter ' is the octave band by octave band variance ratio.
CSS
Level ' Haar D(4) LA(8)
1 1.82 89:9 92:0 92:2
2 1.55 42:5 42:7 40:1
" = 1.5
3 1.33 13:9 14:0 11:7
4 1.19 8:2 7:5 7:0
1 2.64 99:9 99:9 99:9
2 2.10 82:8 83:0 80:3
"=2
3 1.65 32:9 31:7 24:8
4 1.38 12:2 11:1 8:5
1 4.29 100:0 100:0 100:0
2 3.19 98:9 98:8 98:5
"=3
3 2.30 67:9 63:9 52:7
4 1.75 25:4 22:3 13:1
1 5.93 100:0 100:0 100:0
2 4.29 99:9 99:8 99:8
"=4
3 2.95 84:8 82:6 71:6
4 2.13 40:3 32:9 19:9
73

discussion).

4.4.3 Conclusions
Several procedures for testing homogeneity of variance, on a scale by scale basis,
have been investigated with respect to fractional dierence processes. When using
Monte Carlo critical values, the DWT-based cumulative sum of squares procedure is
shown to have an adequate empirical size. When using asymptotic critical values, this
procedure gives reasonable results when at least 128 wavelet coecients are present
for testing. The MODWT-based CSS procedure using asymptotic critical values is
slightly conservative when at least 128 wavelet coecients are used.
I have also shown that the cumulative sum of squares test statistic, based on the
DWT, can successfully detect changes of variance in fractional dierence processes.
Depending on the magnitude of the variance change, the rst two scales are primarily
aected. This is to be expected since the octave band variance ratio decreases as the
scale of the DWT increases.

4.5 Locating a Single Variance Change


4.5.1 Auxiliary Test
We shift our attention to determining the location of a variance change in the original
time series. A naive choice of location can be based on the cumulative sum of squares
statistic D i.e., on the location of the wavelet coecient at which the cumulative sum
of squares at level j achieves its maximum. Since each wavelet coecient is simply a
linear combination of observations from the original series, this procedure will yield a
range of observations which contain the change of variance. The subsampling inherent
in the DWT, however, causes a loss of resolution at each scale with respect to the
original time series. We thus propose to use the MODWT (Section 3.3) to more
accurately determine the location of a variance change after it has been detected by
74

the DWT. This way, the location of a variance change may be associated with a
speci c observation in the original time series. This is another example of how the
MODWT, through a lack of subsampling, is useful over and above the usual DWT.

4.5.2 Simulation Study


A study was conducted to investigate how well the statistic De , which is similar to
D but computed using the MODWT wavelet coecients, locates a single variance
change in a series with long memory structure when a change is already detected
using the DWT. To do this, we implemented the procedure described in Section 4.3,
of order J = 4 and for one long memory parameter d = 0:4, repeated 10,000 times
to test for a single change of variance in time series of length N = 656. The sample
size and choice of long memory parameter were motivated by the Nile River example
(Section 6.1). The following steps were added to the detection procedure in order
to simultaneously estimate the location of the variance change. The additional steps
include

#2a] computing the partial MODWT of order J , de ned in Section 3.3, using the
Haar, D(4) and LA(8) wavelet lters

#3a] discarding all MODWT wavelet coecients, on each scale, aected by the pe-
riodic boundary conditions

#4a] computing the statistic De for all scales based upon the remaining MODWT
wavelet coecients

#5a] recording the location of the wavelet coecient from which the statistic De ,
computed using the MODWT, attains its value and adjusting for the phase by
shifting the location L2 units to the left (see Percival and Mofjeld (1997) for
j

more details).
75
100 200 300 400 500

4:1 4:1 4:1


Haar D(4) LA(8)

Level 2 ...
...
....
....
.........
....
.....
.....
.....
......
....
...
.............. . ....
......
.....
.....
.....
....
..
....
....
...
...
........
....
.....
.........
....
............. . .....
........ ......
..
...
.....
.......
.....
....
.....
................. . . .

Level 1 ....
...
......
...
.....
...
...
..
...
.............. . ..
....
...
......
.....
....
...
...
..
..........
... ...
....... ...
.......
.....
.......
...
...
...
...
....
.....
.........

3:1 3:1 3:1


Haar D(4) LA(8)

Level 2 .....
........ ......
..
...
....
..
...
........
...
......
..
...
............
....
.....
...
.............. . .. . ..
.............. .....
....
...
......
.....
..
....
...
...
.....
...
...
....
....
............................ ... .. . .......... .....
.......
..
...
...
........
...
.......
....
.....
....
.....
.......
.........
............................ .

Level 1 .....
..
....
......
....
....
....
...
....
..
...
..
...
....
.....
...
......
....
............... . .....
..
.......
......
......
.....
...
.....
...
...
...
....
.................. .. ....
...
.........
.....
......
....
...
..
...........
...
...
...
........................

2:1 2:1 2:1


Haar D(4) LA(8)

Level 2 .... ...


......
..
.........
........
...
.. ........
.....
...
......
.....
........
...........
...........
........................
....... ... ... ...... ......
..
...
..........
..
.....
...
....
....
....
.........
...
...
......
......
.....
....
..............
.............
.....
................... .... ... ....
......
......
..........
.. ... ..........
.. ..........
...
..........
........
.....................
..................
........... .. .. . . . .

Level 1 ....
..... ....
...
....
....
.....
.....
.....
...
.....
...
..
...................................... .....
....... ..
......
...
...
........
...
..
...
.. ......
......
.....
......
........
.. ......
.....
.........
............... ... . .....
........ .....
...
.........
..
...
...
...
.......
......
..
...
.......
.............
........................ .

1.5:1 1.5:1 1.5:1


Haar D(4) LA(8)

Level 2 .....
.....
...
....
....
.........
.........
....
............
........................................... . .. ..............
.. ..
....
....
....
.......
......
.....
....
..
......
.....
.................
............. ... . ...
....
....
....
................
..............
....
......
...........
................ . .

Level 1 ... ...


.....
...
....
...
.......
...
...
...
....
........
...
...........
..
.....
...
.........
.......
............
...
....
...................... .. . . . .. .....
...
.....
......
.....
......
..
.....
..............
................
.........
.......
.....
.......
................. .. ... ......
....
.....
....
...
.....
...
.......
......
........
......
...
..........................
.........
....................... .. . .

100 200 300 400 500 100 200 300 400 500

Wavelet Coefficient

Figure 4.7: Estimated locations of variance change at k = 100 for fractional dierence
processes (N = 656 d = 0:4) using the MODWT. Each boxplot contains a varying
number of estimates corresponding to the associated rejection rate.
76

As in Section 4.4.2, Step #1] was modi ed by adding a sequence of white noise to
the rst 100 observations creating variance ratios of 1.5, 2, 3 and 4. The estimated
locations of the variance changes are displayed in Figure 4.7. The estimates are
roughly centered around the 100th wavelet coecient for j = 1 2 with the spread
narrowing as the variance ratio increases. There is a very slight dierence between
wavelet lters, the broader spread being associated with the longer wavelet lters.
However, for variance ratios of " = 2 or greater all three wavelets appear to perform
equally well.
The estimates from the rst level of the MODWT have a median value closer to
the truth with much less spread at every combination of variance ratio and wavelet
lter, as compared to the second level. The slight positive bias appearing in the
rst scale, with more bias in the second, appears to be an intrinsic feature to the
cumulative sum of squares method. Inclan and Tiao (1994) showed that the average
estimated location of change is biased towards the middle of the series when using
such a procedure for sample sizes of 100, 200 and 500, and variance ratios of " = 2
and 3. This should be kept in mind when interpreting the results from such an
analysis.

4.5.3 Conclusions

I have shown that the cumulative sum of squares statistic, using the MODWT, can
accurately locate a change of variance in fractional dierence processes. When the
variance ratio is large enough (" 2), the wavelet coecients at the rst scale are
distributed very tightly around the true location. Wavelet coecients at the second
scale require larger variance ratios (" 3) to achieve the same level of accuracy.
To reduce bias, I recommend using the estimate associated with the rst level when
trying to locate a sudden change of variance in a time series.
77

4.6 Testing for Multiple Variance Changes


4.6.1 Iterated Algorithm
We move on to a natural extension of the previous section { multiple unknown vari-
ance change points. In practice, a given time series may exhibit more than one
potential change in variance. Inclan and Tiao (1994) and Chen and Gupta (1997)
have both recently looked at this problem. They employ a \bisection algorithm"
where the entire time series is initially tested. If there is a signi cant change of vari-
ance, then the series is split in two about the estimated change point and each tested
individually. This process is iterated until no signi cant changes are found.
Inclan and Tiao (1994) included an additional step when detecting multiple vari-
ance change points. After the bisection algorithm had terminated, each \potential"
change point was tested again using only those observations between its two adjacent
change points. For example, a vector of length 128 containing potential change points
at 26, 69, and 108, would again test 26 using only observations 1 : : :  69, test 69 us-
ing observations 26 : : :  108 and test 108 using observations 69 : : :  128. This was to
compensate for an apparent overestimation of the number of variance change points.
Simulations were run both with and without this additional procedure. The rejection
rates for the rst two scales were found to change up to 4% for low variance ratios
and up to 1% for larger variance ratios. All tables using the iterated CSS method
include the extra step.
For a time series Y1 : : :  YN , the iterated CSS algorithm proceeds as follows:

#1] Determine the test statistic D, via the procedure described in Section 4.3, and
record the point k1 at which D is attained. If D exceeds its critical value for
a given level of signi cance , then proceed to the next step. If D is less than
the critical value, the algorithm terminates.

#2] Determine the test statistic D for the new time series Y1 : : :  Yk1 . If D exceeds
78

its critical value, then repeat step 2 until D is less than its critical value.

#3] Determine the test statistic D for the new time series Yk1  : : :  YN . Repeat
step 3 until D is less than its critical value.

#4] Go through the potential change points as outlined above.

4.6.2 Empirical Power


The procedure outlined in Section 4.6.1 was repeated for a speci c sample size N =
656 and long memory parameter d = 0:40, with a partial DWT of order J = 4.
Again, a vector of independent Gaussian random variables was added to the rst 100
observations of the fractional dierence processes in Step #1] of Section 4.3. Instead
of adjusting the long memory parameter, the variance ratio between the rst 100 and
remaining observations was adjusted { producing variance ratios of " = 1:5, 2, 3 and
4.
Table 4.5 displays simulation results for the iterated CSS method, employed in
this divide-and-conquer scheme, when detecting one unknown variance change point.
All tests were performed at the  = 0:05 level of signi cance, and the same octave
band variance ratios ' apply as in Table 4.4. We see the CSS procedure does quite
well at locating only one variance change point for all variance ratios. With a ratio
of " = 2 or greater, it errs only towards multiple change points { always indicating
at least one change point in the rst scales. For larger variance ratios (" 3) the
procedure produces rejection rates around 90% or greater in the rst two scales, and
it errs on the side of three or more change-points with greater frequency.
Table 4.6 displays simulation results for the iterated CSS when detecting two
unknown variance change points. Again, all tests were performed at the  = 0:05
level. Gaussian random variables, of length 100, were added to the middle of the
series creating two variance changes at k1 = 250 and k2 = 350 for a variety of
79

Table 4.5: Empirical power of iterated Cumulative Sum of Squares (CSS) algorithm
for fractional dierence processes (N = 512 d = 0:4) with one variance change at
k = 100.
Haar D(4) LA(8)
Level 0 1 2 0 1 2 0 1 2
1 9:5 85.2 5:3 7:6 87.8 4:6 7:3 88.8 3:9
2 58:9 39.8 1:2 58:9 39.7 1:4 60:4 38.5 1:2
" = 1.5
3 87:5 12.2 0:3 88:2 11.6 0:2 90:3 9.5 0:2
4 95:1 4.9 0:0 95:2 4.8 0:0 95:9 4.1 0:0
1 0:1 93.0 6:9 0:1 93.5 6:4 0:1 94.2 5:7
2 17:7 79.6 2:8 17:9 79.5 2:6 20:6 77.1 2:3
"=2
3 69:2 30.1 0:7 71:2 28.2 0:6 77:2 22.4 0:4
4 90:8 9.2 0:0 91:2 8.8 0:0 93:5 6.5 0:0
1 0:0 92.9 7:1 0:0 93.6 6:4 0:0 93.8 6:2
2 1:3 95.4 3:3 1:2 95.5 3:3 1:9 95.0 3:2
"=3
3 33:8 64.7 1:5 38:3 60.4 1:2 49:4 49.6 1:0
4 79:4 20.6 0:0 80:9 19.1 0:0 87:2 12.3 0:0
1 0:0 92.9 7:1 0:0 92.7 7:3 0:0 93.2 6:8
2 0:1 96.3 3:6 0:1 96.3 3:6 0:2 96.5 3:3
"=4
3 16:6 81.5 1:9 19:1 79.1 1:8 29:1 69.4 1:5
4 66:0 34.0 0:0 69:1 30.9 0:0 79:4 20.6 0:0
Table 4.6: Empirical power of the iterated Cumulative Sum of Squares (CSS) algorithm for fractional dierence
processes (N = 512 d = 0:4) with two variance changes at k1 = 250 and k2 = 350. Variance ratios are given by ".
Haar D(4) LA(8)
Level 0 1 2 3 0 1 2 3 0 1 2 3
1 14:5 9 :3 71.8 4:3 11:7 10:4 73.6 4:2 11:2 11:4 74.2 3:2
2 67:4 20:8 11.7 0:1 67:3 22:6 10.0 0:1 67:7 23:8 8.5 0:1
" = 1.5
3 88:7 10:1 1.1 0:0 88:8 10:6 0.6 0 :0 90:4 9:2 0.4 0:0
4 94:6 5 :4 0.0 0:0 100:0 0 :0 0.0 0:0 100:0 0 :0 0.0 0:0
1 0:1 0 :2 91.8 7:9 0:0 0 :2 92.4 7:3 0:0 0 :3 94.0 5:8
2 26:7 17:2 55.4 0:7 26:2 22:1 51.1 0:6 27:2 26:4 45.8 0:5
"=2
3 75:8 18:5 5.6 0:0 77:3 19:0 3.7 0:0 78:8 18:3 2.9 0:0
4 91:5 8 :5 0.0 0:0 100:0 0 :0 0.0 0 :0 100:0 0:0 0.0 0:0
1 0:0 0 :0 90.4 9:6 0:0 0 :0 91.4 8:6 0:0 0 :0 92.5 7:5
2 1:6 2 :2 93.8 2:3 1:6 4 :0 92.5 1:9 2:2 5 :5 90.4 1:9
"=3
3 48:0 24:0 27.8 0:2 49:7 29:8 20.3 0:2 55:2 29:4 15.3 0:1
4 83:5 16:5 0.0 0:0 100:0 0 :0 0.0 0:0 100:0 0 :0 0.0 0:0
1 0:0 0 :0 89.5 10:5 0:0 0 :0 90.5 9:5 0:0 0 :0 91.4 8:6
2 0:1 0 :2 96.5 3:2 0:1 0 :7 96.6 2:6 0:1 0 :9 96.3 2:8
"=4
3 26:3 18:8 54.4 0:6 29:6 28:0 42.1 0:4 34:9 28:9 35.7 0:5
80 4 74:0 26:0 0.0 0:0 100:0 0 :0 0.0 0:0 100:0 0 :0 0.0 0:0
81

variance ratios. The iterated CSS method once again performs quite well for small
variance ratios " = 1:5, with a slight increase in power as the wavelet lter increases
in length. For larger variance ratios, the rst scale gives a maximum rejection rate
of 94% and then hovers around 90% for very large ". All errors in the rst scale, for
higher variance ratios, are towards overestimating the number of variance changes.
The second scale, which exhibits almost no power for smaller variance ratios, rapidly
approaches the 90{95% range for " 3 and errs primarily towards overestimating the
number of variance changes also. The 100% rejection rates for the D(4) and LA(8)
wavelet lters in the fourth scale occurs because of a reduction, due to boundary
aects, in the number of wavelet coecients below a minimum established threshold.

4.6.3 Locating Multiple Variance Changes


The MODWT can be utilized again to estimate the location of multiple variance
changes. The procedure is slightly more complicated than in the single variance
change scenario, but manageable. For each iteration of the algorithm, estimated lo-
cations of the variance change for both the DWT and MODWT are recorded. The
DWT estimates are used to test for homogeneity of variance and the MODWT es-
timates are used to determine the time of the variance change, as in Section 4.5.
Obviously, the MODWT estimate of the time of the variance change is discarded if
the variance change is found not to be signi cant.
Figures 4.8a{d displays the estimated location of variance change for various frac-
tional dierence processes with one change of variance. We see more and more of the
area of the histogram centered at k = 100 as the variance ratio increases. There also
appears to be a small percentage of rejections to the right of the main peak across all
levels and wavelet lters. This is to be expected, since we are not forcing the testing
procedure to stop at only one change of variance. The small percentage of second or
third variance changes (cf. Table 4.5) in the same scale appear as an increase in the
right tails of these histograms. With this feature in mind, the procedure still performs
82
0 100 200 300 400 500 0 100 200 300 400 500

1.5:1 1.5:1 1.5:1 1.5:1


Haar Haar Haar Haar
1 2 3 4

80

60

40

20

0
1.5:1 1.5:1 1.5:1 1.5:1
D(4) D(4) D(4) D(4)
1 2 3 4

80

60
Level

40

20

0
1.5:1 1.5:1 1.5:1 1.5:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4

80

60

40

20

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.8a: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 1:5.
83
0 100 200 300 400 500 0 100 200 300 400 500

2:1 2:1 2:1 2:1


Haar Haar Haar Haar
1 2 3 4

80

60

40

20

0
2:1 2:1 2:1 2:1
D(4) D(4) D(4) D(4)
1 2 3 4

80

60
Level

40

20

0
2:1 2:1 2:1 2:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4

80

60

40

20

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.8b: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 2.
84
0 100 200 300 400 500 0 100 200 300 400 500

3:1 3:1 3:1 3:1


Haar Haar Haar Haar
1 2 3 4

80

60

40

20

0
3:1 3:1 3:1 3:1
D(4) D(4) D(4) D(4)
1 2 3 4

80

60
Level

40

20

0
3:1 3:1 3:1 3:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4

80

60

40

20

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.8c: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 3.
85
0 100 200 300 400 500 0 100 200 300 400 500

4:1 4:1 4:1 4:1


Haar Haar Haar Haar
1 2 3 4

80

60

40

20

0
4:1 4:1 4:1 4:1
D(4) D(4) D(4) D(4)
1 2 3 4

80

60
Level

40

20

0
4:1 4:1 4:1 4:1
LA(8) LA(8) LA(8) LA(8)
1 2 3 4

80

60

40

20

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.8d: Estimated locations of variance change at k = 100 for fractional dif-
ference processes (N = 656 d = 0:4) using the iterated cumulative sum of squares
procedure and maximal overlap discrete wavelet transform. The variance ratio is
" = 4.
86

quite well when the variance ratio is relatively large (" 2), especially in the rst
two scales. The third and fourth scales are quite spread out and not recommended
for estimating variance change points in practice.
Figures 4.9a{d display the estimated location of variance change for various frac-
tional dierence processes with two variance changes. For small variance ratios
(" = 1:5) the cumulative sum of squares procedure does a decent job with locat-
ing the variance changes in the rst scale, with mixed results for the second scale.
As before, we do not expect much information to come from looking at higher scales.
Although, as the magnitude of the variance ratio increases the higher scales (j = 3 4)
do exhibit structure similar to the rst two scales. Regardless, we shall strictly use
the rst two scales for inference in the future. With respect to the rst two scales,
as the variance ratio increases to, say, " = 3 or 4, then the bimodality is readily
apparent. As was the case for a single variance change, the longer wavelet lters give
a slightly more spread out distribution for the locations of the variance changes. To
be more precise, the estimated locations appear to be skewed to the right at k1 = 250
and k2 = 350 for the D(4) and LA(8) wavelet lters, especially in the second scale.

4.6.4 Conclusions

I have presented the iterated cumulative sums of squares (CSS) algorithm for detect-
ing and locating multiple variance changes in time series with long-range dependence.
The rst scale of wavelet coecients is quite powerful for variance ratios of " = 2 or
greater, for either one or two variance change-points. The second scale is also equally
powerful, but for variance ratios of " = 3 or greater. This procedure also performs
well at locating single or multiple variance changes using the auxiliary test statistic
compute via the MODWT.
87
0 100 200 300 400 500 0 100 200 300 400 500

1.5:1 1.5:1 1.5:1 1.5:1


LA(8) LA(8) LA(8) LA(8)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0
1.5:1 1.5:1 1.5:1 1.5:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total

40

30

20

10

0
1.5:1 1.5:1 1.5:1 1.5:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.9a: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 1.5.
88
0 100 200 300 400 500 0 100 200 300 400 500

2:1 2:1 2:1 2:1


LA(8) LA(8) LA(8) LA(8)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0
2:1 2:1 2:1 2:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total

40

30

20

10

0
2:1 2:1 2:1 2:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.9b: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 2.
89
0 100 200 300 400 500 0 100 200 300 400 500

3:1 3:1 3:1 3:1


LA(8) LA(8) LA(8) LA(8)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0
3:1 3:1 3:1 3:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total

40

30

20

10

0
3:1 3:1 3:1 3:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.9c: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 3.
90
0 100 200 300 400 500 0 100 200 300 400 500

4:1 4:1 4:1 4:1


LA(8) LA(8) LA(8) LA(8)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0
4:1 4:1 4:1 4:1
Haar Haar Haar Haar
Level 1 Level 2 Level 3 Level 4
Percent of Total

40

30

20

10

0
4:1 4:1 4:1 4:1
D(4) D(4) D(4) D(4)
Level 1 Level 2 Level 3 Level 4

40

30

20

10

0 100 200 300 400 500 0 100 200 300 400 500

Wavelet Coefficient

Figure 4.9d: Estimated locations of variance change at k1 = 251 and k2 = 350 for
fractional dierence processes (N = 656 d = 0:4) using the iterated cumulative sum
of squares procedure and maximal overlap discrete wavelet transform. The variance
ratio " is 4.
91

4.7 Testing for a Change in the Long Memory Parameter


4.7.1 Introduction
It is also possible to utilize the test for homogeneity of variance to test, at least indi-
rectly, for a change in the long memory parameter of a fractional dierence process.
A change in the long memory parameter should manifest itself dierently than a sud-
den change of variance in each scale of wavelet coecients. To be precise, whereas
a change in variance will primarily aect only the rst two scales of wavelet coef-
cients, a change in the long memory parameter should aect much higher scales
(corresponding to lower frequencies).
We restrict ourselves to the following alternative hypothesis, namely, the long
memory parameter makes a sudden change from one value to another at time t0
while the process variance remains constant. How to construct such a process is
given below. Let fUtg be a fractional dierence process with long memory parameter
d1 and fVtg be a fractional dierence process with long memory parameter d2. In
both processes the innovations variance is de ned to be unity. From Section 2.1.1,
we have expressions for the spectral density functions, variances, and autocovariance
sequences of these processes.
Suppose X1 : : :  XN is a time series where the rst t0 observations are a realization
of a portion of the fractional dierence process fUtg. So the variance of X1 : : :  Xt0
is simply
;(1 ; 2d1 )  k = 1 : : :  t
VarfXk g = VarfUtg = #;(1 ; d )]2 1
0

(cf. Equation 2.2). Let the remaining N ; t0 observations be a realization of a portion


of a fractional dierence process with long memory parameter d2 and innovations
variance
2  VarfUtg :
2 VarfV g
t
92

Hence, the variance of Xt+1  : : :  XN is equivalent to the variance of X1 : : :  Xt0 . The
only change at time t0 occurs with respect to the long memory parameter.
15

d = 0.05
d = 0.25
d = 0.40
d = 0.45
10
5
dB
0
-5
-10

0.001 0.005 0.010 0.050 0.100 0.500


Frequency

Figure 4.10: Spectra of fractional dierence processes and octave bands of the DWT.
Frequencies between the vertical dashed lines correspond to approximate pass-bands
of the DWT. The spectra have been normalized in order to produce time series with
the variance of a fractional dierence process with long memory parameter d = 0:05.

Figure 4.10 shows how the spectra from several dierent fractional dierence pro-
cesses compare throughout octave bands which approximately correspond to scales
of the DWT. All spectra have the same total energy, which is equal to the variance
of a fractional dierence process with long memory parameter of d = 0:05. Since the
wavelet variance is approximately the integral of the spectral density function over
an octave band (Percival and Guttorp 1994), we would not expect to detect a change
93

in the variance when the spectra of the two fractional dierence processes cross. In
fact, the variability of the wavelet coecients would be greater in the section of the
time series with smaller long memory parameter than the section with the larger long
memory parameter for the scale before they intersect, with this pattern reversed for
all scales after the intersection.

4.7.2 Simulation Results


To explore how the test for homogeneity of variance, as proposed in Section 4.3, re-
sponds to a change in the long memory parameter of a fractional dierence process we
rst simulated such processes using the methodology presented in Section 2.2. When
run 1000 times, where the change in the long memory parameter occurred at three
dierent locations (t0 = 200 512 824), the rejection rates for testing homogeneity of
variance are given in Table 4.7. All tests were performed at the  = 0:05 level of
signi cance. As in Table 4.4, the parameter ' gives an octave band by octave band
variance ratio i.e., how much the underlying spectrum from the beginning of the
series diers from the underlying spectrum from the end of the series on an octave
band by octave band basis. These variance ratios are greater than one for the rst
few scales and then less than one for all subsequent scales. This agrees with the
relationship between the underlying spectra as seen in Figure 4.10.
For a sudden change in the long memory parameter from d1 = 0:05 to d2 = 0:25,
the testing procedure does a fair job with rejection rates reaching 80% when the
change occurs in the middle of the series. We also see rejection rates around 5% at
the third scale, where the two spectra associated with the rst and second portion
of the process cross, and slightly higher rejection rates in the subsequent scales. We
see much higher rejection rates in the rst scale ( 100%) when the change in the
long memory parameter increases to 0.4 or 0.45. When d2 = 0:4, the spectra cross in
the fourth scale with rejection rates hovering around 5% and a slight increase in the
rejection rates for larger scales. When d2 = 0:45, the spectra cross essentially on the
94

Table 4.7: Rejection rates for a change in the long memory parameter of a fractional
dierence process (N = 1024). For all cases, observations X1 : : :  Xt0 have long
memory parameter d1 = 0:05, while Xt0+1 : : :  XN have long memory paramter d2.
The quantity ' provides the octave band by octave band variance ratio.

t0 = 200 t0 = 512 t0 = 824


Level Haar D(4) LA(8) Haar D(4) LA(8) Haar D(4) LA(8)
1 1.48 44.8 44.6 50.9 69.6 78.1 79.2 25.2 29.8 27.8
2 1.21 7.6 9.7 9.5 15.0 15.9 16.0 6.8 7.2 5.5
2 = 0:25

3 0.93 5.1 6.5 5.7 5.7 6.1 6.2 5.7 4.5 5.5
4 0.71 5.5 5.1 5.3 10.6 11.5 12.7 9.8 10.9 7.4
d

5 0.54 5.1 6.7 4.5 13.3 15.4 14.0 14.7 14.3 10.8
6 0.41 5.6 5.3 4.9 15.5 15.5 11.2 17.4 14.1 5.4
1 3.08 100.0 100.0 100.0 100.0 100.0 100.0 99.8 100.0 99.8
2 2.16 77.0 80.2 82.7 96.2 97.0 96.6 47.1 49.3 42.3
3 1.37 13.1 15.5 11.4 18.9 21.8 19.7 7.1 7.7 7.0
2 = 0:4

4 0.85 5.4 5.6 5.3 6.5 7.4 5.7 5.4 6.7 6.2
d

5 0.53 6.5 6.2 5.0 16.9 17.0 14.6 16.0 14.8 11.2
6 0.32 6.1 6.5 5.0 19.5 15.8 9.6 24.1 20.7 6.6
1 5.76 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
2 3.83 99.6 99.8 99.9 100.0 100.0 100.0 96.2 97.1 96.2
2 = 0:45

3 2.28 58.7 61.9 57.5 82.0 81.8 82.6 20.3 21.1 19.2
4 1.32 9.1 10.7 8.5 8.9 7.9 9.4 5.4 4.1 5.7
d

5 0.76 5.4 5.2 5.7 7.2 7.8 7.5 6.0 7.6 7.4
6 0.44 5.2 5.2 6.0 13.2 10.8 8.4 13.6 13.8 6.1
95

boundary between the fourth and fth scales, and hence, both sets of rejection rates
are relatively small with an increase seen in the sixth scale.

4.7.3 Conclusions
I have briey investigated how the test for homogeneity of variance reacts to changes
in the long memory parameter of a fractional dierence process which constitute
a simple example of a generalized fractional dierence process (Wang et al. 1997).
Speci cally, the simulation procedure in Section 2.2 has been modi ed to produce time
series where the long memory parameter changes at a given time, but the variance
of the process remains constant. When applying the testing procedure on time series
simulated this way, the pattern of rejection rates across scales diers from those
found when a simple change in variance occurs. Hence, this method shows promise in
addressing a very dierent alternative hypothesis { one where the variance remains
constant but the autocovariance structure of the time series changes abruptly. The
current procedure is crude and would bene t greatly from additional research.
Chapter 5
WAVELET ANALYSIS OF COVARIANCE
In this chapter, we consider the use of wavelets in the analysis of multivariate
time series. In his thesis, Hudgins (1992) introduced the concepts of the wavelet
cross spectrum and wavelet cross correlation, both in terms of the continuous wavelet
transform. In a subsequent paper, Hudgins, Friehe, and Mayer (1993) applied these
concepts to atmospheric turbulence. They found the bivariate wavelet techniques
provided a better analysis of the data over traditional Fourier methods { especially
at low frequencies. Lindsay, Percival, and Rothrock (1996) de ned the sample wavelet
covariance for the discrete wavelet transform (DWT) and maximal overlap discrete
wavelet transform (MODWT) along with con dence intervals based on large sam-
ple results. These methods were applied to the surface temperature and albedo of
ice pack in the Beaufort Sea. Kawata and Arimoto (1996) discussed the wavelet
correlation and its ability to match features between two signals. They show the
estimated wavelet correlation is eective when compared to other measures of \local
correlation," such as the Gabor transform (or short-time Fourier transform). Li and
Nozaki (1997) also introduced the wavelet cross-correlation in terms of the contin-
uous wavelet transform and related it to the cross spectrum. They went on to use
the real portion of the wavelet cross-correlation to analyze both simulated and real
signals. Recently, Torrence and Compo (1998) discussed the cross-wavelet spectrum,
which is complex valued, and the cross-wavelet power, which is simply the magnitude
of their cross-wavelet spectrum. They also introduced con dence intervals for their
cross-wavelet power and compare the Southern Oscillation Index with the Ni~no3 sea
surface temperature.
97

Here, we introduce quantities that measure the association between two station-
ary processes based on the the DWT and MODWT. First, the wavelet covariance
for bivariate stationary time series is introduced along with the notion of a decom-
position of covariance. That is, the wavelet covariance is shown to decompose the
covariance between two stationary processes on a scale by scale basis. The wavelet
correlation is also introduced, which is analogous to the usual correlation coecient
but utilizes the wavelet covariance and wavelet variance. Asymptotic normality of the
wavelet covariance and correlation is established. Estimation procedures are provided
along with approximate con dence intervals for the estimated wavelet covariance and
wavelet correlation. Both the wavelet covariance and wavelet correlation are general-
ized into the wavelet cross-covariance and wavelet cross-correlation. The lack of shift
invariance of the DWT is shown to bias the variance of the DWT estimator of the
wavelet cross-covariance. This may arise due to misalignment between the two time
series. Finally, moment properties of two potential estimators for the variance of the
wavelet covariance are investigated. One is shown to be clearly superior to the other.

5.1 De
nition of the Wavelet Covariance
Let fXtg and fYtg be stationary processes with univariate spectra (also known as
autospectra) SX ( ) and SY ( ), respectively. The wavelet covariance of fXt  Yt g for
scale j = 2j;1 is de ned to be
n o n o
XY (j )  21 Cov Wjt(X ) Wjt(Y ) = 21 E Wjt(X )Wjt(Y )  (5.1)
j j
where fWjt(X )g and fWjt(Y )g are the scale j wavelet coecients for fXtg and fYtg,
respectively.

5.1.1 Decomposition of Covariance


We now show a basic result of the wavelet covariance, namely, it decomposes the
covariance between two stationary time series on a scale by scale basis. This argument
98

closely follows the proof of decomposition of variance for the wavelet variance, see
Percival and Walden (1999, Sec. 8.1), the major complication here being that the
cross spectrum SXY ( ) is a complex-valued function (cf. Section C.1). We begin by
expressing the covariance between two ltered time series in the Fourier domain.
Proposition 5.1 Suppose that fXtg and fYtg are zero-mean weakly stationary pro-
cesses with autospectra SX ( ) and SY ( ), respectively. If fal j l = 0 : : :  L ; 1g is a
lter of length L with transfer function dened to be
X
L;1
A( f )  ale;i2fl
l=0
then the covariance between fal
Xtg and fal
Ytg is given by
Z 1

Covfal
Xt  al
Ytg = A(f )SXY (f ) df
2

; 12

where A(f )  jA(f )j2 is the squared gain function for A( ).


Proof We can apply the spectral representation theorem (Equation (B.1)) to the
stationary processes fXtg and fYtg, giving us
Z 1
2
Z 1
2
Xt = ei2ft dZ X (f ) and Yt = ei2ft dZY (f )
; 12 ; 12

where fZX ( )g and fZY ( )g are not only orthogonal but also cross-orthogonal pro-
cesses i.e., E #dZX (f ) dZY (f 0)] = 0 f 6= f 0. De ne fUtg and fVtg to be ltered
versions of fXt g and fYtg, respectively i.e.,
X
L;1 X
L;1
Ut  al
Xt = alXt;l and Vt  al
Yt = alYt;l :
l=0 l=0
From Section A.2 we know that convolution in the time domain is equivalent to
multiplication in the Fourier domain, hence, alternative representations for fUtg and
fVtg are given by the spectral representation theorem (Equation (B.1)) via
Z 1
2
Z 1
2
Ut = A(f )ei2ft dZ X (f ) and Vt = A(f )ei2ft dZY (f ):
; 12 ; 21
99

The covariance between fUtg and fVt g, using Fubini's theorem (Lehmann 1983, p. 15)
and the fact that fZX ( )g and fZY ( )g are cross-orthogonal processes, is therefore
(Z 1 Z 1
)
;i2f 0 t
CovfUt Vt g = E fUtVtg = E
2 2
A(f 0)e dZX (f 0) A(f )ei2ft dZY (f )
; 12 ; 12
Z Z 1 1

A(f 0)A(f )ei2(f ;f 0)t E #dZX (f 0) dZY (f )]


2 2
=
; 21 ; 12
Z 1

A(f )SXY (f ) df


2
=
; 21

where SXY ( ) is the cross spectrum of fXt Ytg.


2
Using this fact, we can now establish the nite decomposition of covariance be-
tween two time series using the wavelet covariance.
Proposition 5.2 Let fXtg and fYtg be weakly stationary processes with autospectra
SX ( ) and SY ( ), respectively. Dene VeJt(X )  g~Jl
Xt;l and VeJt(Y )  g~Jl
Yt;l, which
are stationary processes obtained by ltering fXt g and fYt g using the MODWT scaling
lter fg~Jl g. For any integer J 1,
n o X
J
CovfXt  Ytg = Cov VeJt(X ) VeJt(Y ) + XY (j )
j =1
where XY (j ) is the wavelet covariance for scale j .

Proof Because fWfjt(X )g and fWfjt(Y )g are obtained by ltering the stationary pro-
cesses fXtg and fYtg, respectively, we know that fW fjt(X )g and fWfjt(Y )g are stationary
processes with autospectra de ned by
SjX (f ) = Hej (f )SX (f ) and SjY (f ) = Hej (f )SY (f ) (5.2)
where Hej ( ) is the squared gain function for f~hjlg. Since the wavelet covariance
fjt(X )g and fW
XY (j ) is the covariance between fW fjt(Y )g, and since the integral of
100

the cross spectrum SXY ( ) is equal to this covariance, we can use Proposition 5.1 to
obtain
Z 1

Hej (f )SXY (f ) df


2
XY (j ) =
; 12

similarly,

n e (X ) e (Y )o Z e 1

Cov VJt  VJt = GJ (f )SXY (f ) df


2

; 12

where GeJ ( ) is the squared gain function for fg~Jlg. The squared gain functions for
f~hjlg and fg~Jl g are given by a formula equivalent to Equation (3.6) for squared gain
functions i.e.,

Y j ;2 YJ ;1
Hej (f ) = He(2j;1 f ) Ge(2l f ) and GeJ (f ) = Ge(2lf ):
l=0 l=0

Since He(f ) = H(f )=2 and Ge(f ) = G (f )=2, we may use Equation (3.5) to say that
Ge(f ) + He(f ) = 1 for all f . We therefore have
Z Z h
eG(f ) + He(f )i SXY (f ) df
1 1

CovfXt  Ytg =
2 2
SXY (f ) df =
; 21
n e (X ) e (Y )o
; 1
2

= Cov V1t  V1t + XY (1)

and the case when J = 1 holds. We now proceed to prove the main assertion by
induction. Assume the property holds for J ; 1 i.e.,

n e (X ) e (Y ) o X J ;1
CovfXt Ytg = Cov VJ ;1t  VJ ;1t + XY (j ):
j =1
101

So we have
n o Z 1

VJ(;X1)t VJ(;Y 1)t GeJ ;1 (f )SXY (f ) df


2
Cov =
Z "JY #
; 12
1 ;2
Ge(2lf ) SXY (f ) df
2
=
"JY;2 #
1

Z h l=0
; 2
1
e J ;1 e J ;1 i e
G (2 f ) + H(2 f ) G (2lf ) SXY (f ) df
2
=
; 12 l=0
Z h1 i
GeJ (f ) + HeJ (f ) SXY (f ) df
2
=
n o
; 1
2

= Cov VeJt(X ) VeJt(Y ) + XY (J ):

Therefore,
n e (X ) e (Y )o X
J ;1
CovfXt  Ytg = Cov VJt  VJt + XY (J ) + XY (j )
j =1
n o X J
= Cov VeJt(X ) VeJt(Y ) + XY (j )
j =1
2
The decomposition of covariance will now be established by allowing J ! 1.
This is intuitively plausible since the wavelet lter is capturing smaller and smaller
portions of the cross spectrum as J gets larger.

Theorem 5.1 Let fXtg and fYtg be stationary processes with autospectra SX ( ) and
SY ( ), respectively, and let XY (j ) be the wavelet covariance associated with scale
j , then
X
1
XY (j ) = CovfXt Ytg
j =1

that is, the wavelet covariance decomposes the covariance between fXt g and fYtg on
a scale by scale basis.
102

 n (X ) (Y )o
Lemma 5.1 For all > 0, there exists a J such that Cov VeJt  VeJt  < for
J>J.

Proof Because Pl gJl2 = 1 and g~Jl = gJl=2J=2 , we have Pl g~Jl2 = 1=2J . Parseval's
relation (cf. Section A.2) tells us that

Z Z  2 LX
;1
2 = 1:
1 1

GeJ (f ) df = e
J
2

; 12
2

1
GJ (f ) df = g~Jl 2J
;2 l=0

Recall, we know the amplitude spectrum AXY (f )  jSXY (f )j is a non-negative real


valued function (cf. Section C.1). Hence, if AXY ( ) is bounded by some nite number
C , then for J > J ,

 n e (X ) e (Y )o Z 
=  GeJ (f )SXY (f ) df 
1

Cov VJt  VJt 


2

; 1

Z 1
2

 GeJ (f ) jSXY (f )j df
2

1
Z
;2

GeJ (f )AXY (f ) df
1
2
= 1
Z
;2

 C GeJ (f ) df = CJ < :
1
2

; 12 2

If AXY ( ) cannot be bounded by any nite number C , there at least exists a constant
C such that

Z
AXY (f ) df < 2 
AXY (f )C
103

using a Lebesgue integral. A rough bound on the squared gain function of the scaling
lter for Daubechies wavelets is GeJ (f )  1, so for all J > J ,
Z 
 GeJ (f )SXY (f ) df 
1
2

; 
Z Z 
1
2

=  GeJ (f )SXY (f ) df + GeJ (f )SXY (f ) df 


ZA (f )C
XY AZ (f )<C
XY

 GeJ (f ) jSXY (f )j df + GeJ (f ) jSXY (f )j df


ZA (f )C
XY
Z A (f )<CXY

 AXY (f ) df + C GeJ (f ) df
A (f )C A (f )<C

XY
Z XY

 + C GeJ (f ) df
1
2

2 ; 12

 2 + C2J < :
2

Proof of Theorem 5.1 From Proposition 5.2, the decomposition of covariance


between fXt  Ytg has been established for a nite number of scales, say J . From
Lemma 5.1, as J ! 1 the remaining covariance between the scaling coecients goes
to zero. Hence, the theorem is established.

The de nition of the wavelet covariance can be expanded to the wavelet cross-
covariance XY (j ) by allowing one sequence of wavelet coecients to be shifted by
a speci c lag
 i.e.,
1 n (X ) (Y ) o
XY (j )  2 Cov Wjt  Wjt+ :
j

Theorem 5.1 still holds since the spectral properties of fYt+ g are identical to those
of fYtg.
104

5.1.2 Wavelet Correlation


We can de ne another quantity, based on the wavelet covariance, the wavelet cross-
correlation for scale j de ned as

XY (j )  (XY (j ) 


 ) ( )
X j Y j
where
is the lag, and X2 (j ) and Y2 (j ) are the wavelet variances for fXt g and fYt g
associated with scale j , respectively. At lag 0, the is simply the wavelet correlation
function XY (j ). As with the usual correlation coecient between two random
variables, ;1  XY (j )  1, for all
 j . This can be shown using the Cauchy{
Schwartz inequality,
 n (X ) (Y ) o
2j jY X (j )j = Cov Wjt  Wjt+ 
h (X )i2 h i 
2
1

 E Wjt Wjt(Y+)
2
E
  1
= 2j X2 (j ) Y2 (j ) 2 = 2j X (j ) Y (j ):

Therefore, the magnitude of the wavelet cross-correlation is bounded


  ( ) 
jXY (j )j =  (Y X) (j )   1:
X j Y j
The wavelet correlation is analogous to its Fourier equivalent, the complex coherency
(Equation (C.2)). Just as the cross-correlation is used to determine lead/lag relation-
ships between two processes, the wavelet cross-correlation should be able to provide
a lead/lag relationship on a scale by scale basis.

5.2 Estimating the Wavelet Covariance


Let fXtg be a stochastic process whose dth order backward dierence is a stationary
f1(tX )g be the MODWT wavelet coecients for fXt g
process (cf. Section 2.1.1). Let fW
f1(tX )g is a
associated with unit scale. Percival (1995) showed that if L 2d, then fW
105

stationary process with zero mean and spectrum de ned to be S1X (f ) = He(f )SX (f ).
Percival and Walden (1999, Sec. 8.2) extended this result to arbitrary j using Equa-
tion (4.5), with a slight modi cation, to yield
"Y
j ;2 # L
2

Hj (f ) = Dd (f ) D L
2; d (f ) 4 cos2( 2k f ) C (2j;1 f )Gj(;D1)(f ) = Dd (f )Aj (f ):
k=0
We recognize that Hej (f ) is a two-stage cascade lter, with the second lter Aj ( )
having a form which can be factored into a lter of nite length (Daubechies 1992,
Ch. 6). The output from the rst lter is a stationary process by design (L 2d).
Filtering a zero mean stationary process with a lter of nite length produces a zero
mean stationary process (Priestley 1981), which establishes the claim in general. We
now proceed to provide distributional results for estimators of the wavelet covariance.

5.2.1 The MODWT Estimator


Suppose Xt and Yt t = 0 : : :  N ; 1, can be regarded as realizations of portions of
the processes fXtg and fYtg, whose dX th and dY th order backward dierences form
stationary processes, respectively, and de ne
d  max(dX  dY ):
For N Lj , we can de ne an unbiased estimator ~XY (j ) of the wavelet covariance
based upon the MODWT via
NX;1
1
~XY (j )  e fjl(X )W
W fjl(Y ) (5.3)
Nj l=L ;1 j

where Nej = N ; Lj + 1. Note, the estimator does not include any coecients
that make explicit use of the periodic boundary conditions. We can construct a
biased estimator of the wavelet covariance by simply including the MODWT wavelet
coecients aected by the boundary into Equation (5.3) and renormalizing.
Following an argument similar to Brillinger (1979) to show the asymptotic nor-
mality of ~XY (j ), we require the following central limit theorem.
106

Lemma 5.2 Let Z1 : : :  ZN be a realization from the vector-valued process fZtg,
with mean vector , whose joint cumulant sequence is absolutely summable. Let
1 X;1
N
ZN  N Zt
t=0
be the vector of sample means. Then ZN is asymptotically normally distributed with
mean vector and large sample variance N ;1 SZ(0), where SZ(0) is the spectral matrix
for fZt g evaluated at the frequency f = 0.

Proof This is a special case of Theorem 4.4.1 in Brillinger (1981, p. 94).


If Z1 : : :  ZN is a univariate time series, let us call it Z1 : : :  ZN , then Z N is
asymptotically normal with mean  and large sample variance N ;1SZ (0), where
SZ (0) is the spectral density function for fZtg. This scalar case will be utilized in
the proof of the following theorem.
fjt(X ) W
Theorem 5.2 Let L > 2d, and suppose fW fjt(Y )g is a bivariate Gaussian weakly
stationary process with autospectra satisfying
Z 1 Z 1

1 and 2 (f ) < 1
2 2
SjX
2 (f ) < SjY
; 21 ; 21

then the MODWT estimator ~XY (j ) of the wavelet covariance is asymptotically nor-
mally distributed with mean XY (j ) and large sample variance given by Nej;1 Sj(XY ) (0),
fjt(X )W
where Sj(XY )(0) is the spectral density function for fW fjt(Y )g (the product of the
wavelet coecients).

Proof Since L > 2d, we have that both sets of wavelet coecients fW fjt(X )g and
fjt(Y )g have mean zero. Since fW
fW fjt(X ) W
fjt(Y )g is a bivariate Gaussian second-order
stationary process, it is strictly stationary. Square integrability of the autospectra
implies that

fsjX g ! SjX ( ) and fsjY g ! SjY ( )


107

i.e., the autocovariance sequences and autospectra are Fourier transform pairs. Be-
cause L > 2d, the squared gain function for Daubechies wavelet lters guarantees we
have
X
1
SjX (0) = 0 = sjX :
 =;1

fjt(Y )g and, therefore, fsjtX g and fsjtY g are abso-


A similar statement holds for fW
lutely summable. Let SjXY (f )  Hej (f )SXY (f ) denote the MODWT ltered cross
spectrum. From the magnitude squared coherence being bounded by unity, and using
the Cauchy{Schwarz inequality, we know that
Z 1 Z 1

jSjXY j 
2 2
(f ) 2 df SjX (f )SjY (f ) df
; 12 ; 12
!Z 1 Z 1
! 1
2

 < 1:
2 2
SjX
2 (f ) df SjY
2 (f ) df
; 12 ; 12

So the cross-covariance sequence and cross spectrum associated with scale j are
also a Fourier pair and, again, by using a Daubechies wavelet lters with L > 2d,
we have SjXY (0) = 0. Therefore, the cross-covariance sequence for fW fjt(X ) W
fjt(Y )g is
absolutely summable.
We rst note that the MODWT estimate of the wavelet covariance ~XY (j ) is
essentially a sample mean for the time series W fjt(XY )  Wfjt(X )W
fjt(Y ) (cf. Equation 5.1).
This process also has an absolutely summable cumulant sequence by Theorem 2.9.1
of Brillinger (1981, p. 38). Lemma 5.2 tells us that ~XY (j ) is asymptotically nor-
mal with mean XY (j ) and large sample variance given by Nej;1Sj(XY )(0), where
fjt(XY ) evaluated at f = 0.
Sj(XY )(0) is the spectral density for W

It is easy to see that the estimated wavelet covariance is unbiased. Let ~XY (j )
108

be the MODWT estimator of the wavelet covariance for scale j , then


8 N ;1 9
< 1 X (X ) (Y ) =
E f~XY (j )g = E: e f W
W f
Nj l=L ;1 jl jl
j

1 NX ;1 nf(X ) f(Y )o
= E Wjl Wjl
Nej l=L ;1
1 n (X ) (Y )o
j

= 2 E Wjl Wjl = XY (j ):


j

Since we are exclusively interested in Gaussian processes, Sj(XY )(0) may be re-
expressed as a function of the auto and cross spectra of the wavelet coecients fWjl(X )g
and fWjl(Y )g. The variance of the estimated MODWT wavelet covariance at scale j
can be computed directly via
8 N ;1 9
< 1 X (X ) (Y )=
Varf~XY (j )g = Var : e f W
W f
Nj l=L ;1 jl jlj

X;1 NX;1
N nf(X )f(Y ) f(X ) f(Y )o
= e12 Cov W jl Wjl  Wjm Wjm
Nj l=L ;1 m=L ;1
! ! n
j j

fjl(Y+) o
NeX;1
= e1 j
j
j
1 ; e Cov W fjl(X )W
fjl(Y ) Wfjl(X+) W
Nj  =;(Ne ;1) Nj
! !
j

NeX;1
 e1 j
j
j
1 ; e sjXY  (5.4)
Nj  =;(Ne ;1) j
Nj
where sjXY is the autocovariance sequence for the product of the scale j MODWT
wavelet coecients with respect to fXtg and fYtg.
Let us look at the covariance between the product of wavelet coecients more
closely. We will need the following fact. If A B C D are real-valued Gaussian random
variables with zero mean, then we can use the Isserlis theorem (Isserlis 1918) to claim

CovfAB CDg = CovfA C g CovfB Dg + CovfA Dg CovfB C g: (5.5)


109

Now let fUtg and fVtg be real valued stationary processes with autospectra SU ( )
and SV ( ), respectively, and let Zt = UtVt. Using Equation (5.5) we have

CovfUtVt Ut+ Vt+ g


= CovfUt Ut+ g CovfVt Vt+ g + CovfUt Vt+ g CovfVt Ut+ g
= sU sV + CUV CV U
 sZ 
where fsU g and fsV g are the autocovariance sequences of fUtg and fVt g, respec-
tively, and fCUV g is the cross-covariance sequence for fUt Vtg. The following Fourier
relationships hold:

fsU g ! SU ( )
fsV g ! SV ( )
fCUV g ! SUV ( )
fCV U g ! SV U ( ):

Using the linearity and convolution properties of the Fourier transform (cf. Sec-
tion A.2), the spectrum of fZtg is
Z 21 Z 1
2
SZ (f ) = S (f
1 U
0
)SV (f ; f ) df +0 0
1
SUV (f 0)SV U (f ; f 0) df 0
;2 ;2

and therefore,
Z 21 Z 1
2
SZ (0) = 1 SU (f )SV (0 f ) df +
0
; 0 0
1 SUV (f 0)SV U (0 ; f 0) df 0
;2 ;2
Z 1
2
1
2
Z
= 1 SU (f )SV (f ) df + ; 1 SUV (f )SV U ( ;f ) df
;2 2
Z 1
2
1
2 2
Z
= 1 S U (f )SV (f ) df + 1 SUV (f ) df:
;2 ;2
110

Since we have the Fourier relationship fsZ g ! SZ ( ), we necessarily have


X
1
SZ (0) = sZ 
 =;1
when f = 0. Re-examining Equation (5.4) and utilizing Ces(aro summability (Titch-
marsh 1939, p. 411), we can say
X !
Nej ;1 !
lim Nej Varf~XY (j )g = elim 1 ; je
j sjXY
Nej !1 Nj !1
 =;(Nej ;1) Nj
X
1
= sjXY = Sj(XY )(0)
 =;1
where
Z 1
2
Z 1
2 2
Sj(XY )(0) = S (f )SjY (f ) df
1 jX
+ S (f ) df
1 jXY
 Vj : (5.6)
;2 ;2

Hence, we have the following result

Varf~XY (j )g Vej 


Nj
for large Nej . This will allow us to construct approximate con dence intervals for the
MODWT estimator of the wavelet covariance in Section 5.4.1.

5.2.2 The DWT Estimator


An estimator of the wavelet covariance based upon the DWT can be similarly de ned.
First, we must calculate how many wavelet coecients are aected by the boundary.
If we let
# $
Lj  (L ; 2) 1 ; 21j
0
 (5.7)

then the number of DWT wavelet coecients aected by the boundary is L0j (Percival
and Walden 1999, Sec. 4.10). So we can de ne our unbiased estimator of the wavelet
111

covariance based on the DWT to be


NX;1
^XY (j )  b1 j

Wjl(X )Wjl(Y ) (5.8)


2j Nj l=L0 j

where Nj = N=2j and Nbj = Nj ; L0j . Again, including those DWT coecients
aected by the boundary into Equation (5.8) yields the biased DWT estimator of
wavelet covariance.
It is easy to show that ^XY (j ) is an unbiased estimator of the wavelet covariance
for scale j via
8 9
< 1 NX;1 (X ) (Y )= 1 n (X ) (Y )o
j

E f^XY (j )g = E : b W W = 2 E Wjl Wjl 


2j Nj l=L0 jl jl
j
j

which is equivalent to Equation (5.1). In fact, from Theorem 5.2 we know that
^XY (j ) is unbiased with large sample variance Vj0 =Nbj , where Vj0 involves the auto-
and cross spectra of the scale j DWT wavelet coecients. The large sample variance
for the DWT wavelet covariance ^XY (j ) follows from Equation (5.6) and is de ned
to be
Z 1 Z  1

Vj  (f ) 2 df:
2 2
0
SjX (f )SjY (f ) df +
0 0
SjXY
0
; 12 ; 12

It has already been stated (Section 3.3) that the DWT coecients may be obtained
from subsampling the MODWT coecients on a scale by scale basis. We used this
fact to de ne the spectrum of DWT wavelet coecients for scale j in Section 4.1. For
our purposes here, we explicitly de ne the auto and cross spectrum for the scale j
DWT wavelet coecients to be
2X;1 e ; 1 k S ; 1 f + k 
SjX
0
(f ) 
H
j
j 2 f + 2 j X 2 j 2  j j
(5.9)
k=0 2 j

and
2X;1 e ; 1 k ;1 k
SjXY (f ) 
0 H
j
j 2 f + 2 SXY 2 f + 2
j j
 j
(5.10)
j

k=0 2j
112

respectively, where Hej ( ) is the squared gain function for the scale j MODWT
wavelet coecients.
Let us look at a simple relationship between two processes, linear regression with
delay see e.g., Priestley (1981, pp. 663{664). Thus, if we have two time series fXt g
and fYtg with autospectra SX ( ) and SY ( ), respectively, they are related via
Yt = cXt;d + t
and using properties of the Fourier transform (cf. Section A.2) we know their spectra
are related via
SY (f ) = c2SX (f ) + 2"t:
Their cross spectrum is given by SXY (f ) = ce;i2fd tSX (f ), with co-spectrum tak-
ing the form RXY (f ) = c cos(2 fd"t)SX (f ) and quadrature spectrum QXY (f ) =
c sin(2 fd"t)SX (f ). For simplicity, we assume the time series fXtg is white noise
with variance X2 . Applying these de nitions to the DWT wavelet coecients of fXt g
for unit scale, we have
e1 ; 21 f  SX ; 21 f  + He1 ; 12 f + 21  SX ; 12 f + 12  X2
H
S1X (f ) =
0
2 = 2
;  ; 
since He1 12 f + He1 12 f + 12 = 1 by Equation (3.5). Knowing this, the spectrum for
the DWT wavelet coecients of fYtg is
S10 Y (f ) = c 2 X + 2:
2 2

Finally, the cross-spectrum for the DWT wavelet coecients is


He1 ; 12 f  SXY ; 21 f  + He1 ; 12 f + 21  SXY ; 12 f + 12 
S1XY (f ) =
0
2
c 2 h e ;  e ;  i
= 2 H1 2 f e
X 1 ;ifd
+ H1 2 f + 2 e e
1 1 ;ifd ;id

h ;  ; i
= c 2X e;ifd He1 21 f + (;1)d He1 12 f + 12
2

8
< 1 d even
= c 2X e;ifd : ; 1 
2
; 
He1 2 f ; He1 12 f + 21  d odd:
113

Hence, the variance of ^XY (1) will take on two dierent values depending upon the
delay between fXt g and fYtg. The MODWT estimator of the wavelet covariance,
because of its lack of subsampling, does not suer from this problem.

Table 5.1: Variance of ^XY (j ) j = 1 : : :  6, for two white noise time series associ-
ated via linear regression with delay d. The series fXt g is a white noise process with
X = 1, c = 1 and
2 2 = 0.

Level
d 1 2 3 4 5 6
0 0:9933 0:5002 0:2452 0:1247 0:0613 0:0297
1 0:7477 0:2803 0:1732 0:1045 0:0563 0:0286
2 0:9912 0:3744 0:1402 0:0887 0:0512 0:0271
3 0:7468 0:2799 0:1412 0:0767 0:0461 0:0262
4 0:9893 0:4981 0:1828 0:0697 0:0428 0:0253
5 0:7455 0:2801 0:1422 0:0678 0:0396 0:0244
6 0:9877 0:3734 0:1399 0:0709 0:0370 0:0231
7 0:7446 0:2799 0:1717 0:0780 0:0352 0:0217
8 0:9861 0:4964 0:2434 0:0915 0:0340 0:0211

To illustrate this feature, I performed a simulation using bivariate time series


related via linear regression with delay. For simplicity, fXtg is a white noise process
with X2 = 1, c = 1 and 2 = 0. The results of the simulation are presented in
Table 5.1. Notice that two values are repeated over and over again in the column
corresponding to the rst level of the DWT. The spectral estimates involved in the
variance of ^XY (2 ) can be related to the average of four aliased versions of the
spectra in the variance of the MODWT estimator of the wavelet covariance through
Equations (5.9) and (5.10). Hence, we would expect to observe four distinct values in
114

the variance of ^XY (2), and that is exactly what is displayed in Table 5.1. Although
not shown here, the pattern continues with 8 distinct values for ^XY (3), 16 distinct
values for ^XY (4) and so on.
A common method in spectral analysis to overcome bias due to \misalignment"
when computing bivariate estimators is to simply shift (translate) one series relative
to the other. One method to determine the lead/lag amount between the two series is
to compute the estimated cross-covariance sequence fs^(XY
p)
g and look for a maximum.
Where the maximum occurs indicates the number of units to shift one series. Here, we
may compute the MODWT estimated wavelet cross-covariance sequence f~XY (j )g
and look for the maximum at each scale. Shifting one set of wavelet coecients
by these amounts may overcome this misalignment problem in practice if the DWT
estimator of the wavelet covariance is desired.

5.2.3 Estimating the Wavelet Cross-Covariance


Estimation of the wavelet cross-covariance follows directly from the biased estimator
of the usual cross-covariance (Priestley 1981, pp. 692{693). For N Lj , we can de ne
a biased estimator ~XY (j ) of the wavelet covariance based upon the MODWT via
8 1 PN ; ;1 (X ) (Y )
> f f e
< Ne1 PlN=;L1;1 Wjl (XW)jl+(Y)
= 0 : : :  Nj ; 1
> j

fjl W fjl+ 
= ;1 : : :  ;(Nej ; 1)
j

~XY (j )  > Ne l=L ;1; W (5.11)


>
: 0
j j

otherwise:
The bias is due to the denominator 1=Nej remaining constant for all lags. We are still
not using wavelet coecients which make use of the periodic boundary conditions.
Just as with the wavelet covariance, we can de ne a biased estimator of the wavelet
cross-covariance based on the DWT to be
8 1 PN ; ;1 (X ) (Y ) b
>
< 21Nb PlN=;L10 Wjl(X )Wjl(Y+) 
= 0 : : :  Nj ; 1
>
^XY (j )  > 2Nb l=L0 ; Wjl Wjl+ 
= ;1 : : :  ;(Nbj ; 1)
j j

(5.12)
>
: 0
j j

otherwise:
115

This estimator is biased for the same reason as Equation (5.11). This quantity is
provided for completeness, as stated in Section 5.2.2, the inherent subsampling of the
DWT will result in the variance of ^XY (j ) being 2j -periodic unless the two series
are properly aligned.

5.3 Estimating the Wavelet Correlation and Cross-Correlation


Since the wavelet correlation is simply made up of the wavelet covariance for fXt Ytg
and wavelet variances for fXtg and fYtg, the MODWT estimator of the wavelet
cross-correlation is simply

~XY (j )  ~ ~(XY (j ) 


 )~ ( ) (5.13)
X j Y j

where ~XY (j ) is given in Equation (5.11), and ~X2 (j ) and ~Y2 (j ) are given in
Equation (3.11). When
= 0 we obtain the MODWT estimator of the wavelet
correlation between fXt Ytg.
Large sample theory for the cross-correlation is more dicult to come by than for
the cross-covariance. The following result can be found in Fuller (1996, p. 342).
fjt(X ) W
Proposition 5.3 If fW fjt(Y )g is a bivariate Gaussian weakly stationary process
and if all autocovariance and cross-covariance sequences are absolutely summable,
then

e
lim N Covf~XY (j ) ~ XY (j )g
N !1 j
X
1
= f tX (j )t+ ;Y (j ) + t+ XY (j )t;Y X (j )
t=;1
; XY (j )#tX (j )t+ Y X (j ) + tY (j )t; Y X (j )]
;  XY (j )#tX (j )t+Y X (j ) + tY (j )t; Y X (j )]
+ XY (j ) XY (j )# 21 2tX (j ) + 2tXY (j ) + 12 2tY (j )] g:
116

Proof See Corollary 6.4.1.1 in Fuller (1996).


As established in the proof of Theorem 5.2, we only need square integrability
of the autospectra of fXt Ytg to ensure absolute summability of the autocovariance
and cross-covariance sequences. Therefore, using the assumptions of Theorem 5.2 we
may use the conclusion of Proposition 5.3 to determine the large sample variance of
the wavelet cross-correlation. Thus, for large Nej , the expression in Proposition 5.3
reduces to
NeX;1
Varf~XY (j )g e 1 j

f tX (j )tY (j ) + tXY (j )tY X (j )


Nj t=;(Ne ;1) j

; 20XY (j )#tX (j )tY X (j ) + tY (j )tY X (j )]
+ 20XY (j )# 21 2tX (j ) + 2tXY (j ) + 21 2tY (j )] g
(5.14)
where n o
Wjl(X )Wjl(X+)jtj
2j E
1
tX (j )  X2 (j ) (5.15)
is the lag-t wavelet autocorrelation at scale j for the process fXt g.
Brillinger (1979) constructed approximate con dence intervals for the auto and
cross-correlation sequences of bivariate stationary time series. We present a brief
outline of his result for the MODWT estimated wavelet cross-correlation coecients
in the form a theorem.
fjt(X ) W
Theorem 5.3 Let L > 2d, and suppose fW fjt(Y )g is a bivariate Gaussian weakly
stationary process with square integrable autospectra, then the MODWT estimator
~XY (j ) of the wavelet correlation is asymptotically normally distributed with mean
XY (j ) and large sample variance given by Equation (5.14).
Proof Since L > 2d, we have that both sets of wavelet coecients fW fjt(X )g and
fjt(Y )g have mean zero. Let us de ne
fW
hf(X )i2 hf(Y )i2 fjl(X )W
fjl(Y )
Ajl  W jt  B jl  Wjt  and Cjl  W
117

and subsequently de ne their sample means

1 NX;1
Aj  e Ajl = ~X2 (j )
Nj l=L ;1 j

1 NX;1
Bj  e Bjl = ~Y2 (j ) and
Nj l=L ;1 j

NX;1
C j  e1 Cjl = ~XY (j ):
Nj l=L ;1 j

The vector-valued process fAjt Bjt Cjtg has an absolutely summable joint cumulant
sequence by Theorem 2.9.1 of Brillinger (1981, p. 38). Hence, from Lemma 5.2 the vec-
tor of sample means fAj  Bj  C j g are asymptotically normally distributed with mean
vector f X2 (j ) Y2 (j ) XY (j )g, and large sample variance given by Nej;1 SjABC (0),
where SjABC ( ) is the 3 3 spectral matrix for fAjt Bjt Cjtg (cf. Section C.1).
The MODWT estimator of the wavelet correlation ~XY (j ) is essentially a function
of these sample means g(Aj  B j  C j ), where g(x y z)  z=pxy. Appealing to Mann
and Wald (1943), we have that ~XY (j ) is asymptotically normally distributed with
mean XY (j ) and large sample variance

;  ;
Nej;1 g_ X2 (j ) Y2 (j ) XY (j ) T SjABC (0) g_ X2 (j ) Y2 (j ) XY (j )
 (5.16)

where g_ (   ) is the gradient of g(   ). To complete the proof, we must show


the equivalence of Equation (5.16) to Equation (5.14). Because we are evaluating
SjABC ( ) at f = 0, it is in fact a symmetric matrix of the form
2 3
S
66 jAA (0) SjAB (0) S jAC (0) 77
SjABC (0) = 64 SjAB (0) SjBB (0) SjBC (0) 75 
SjAC (0) SjBC (0) SjCC (0)
118

where the elements of the matrix are

Z 1
2
SjAA (0) = 2 SjX
2 (f ) df
; 12
Z 1
2
SjBB(0) = 2 SjY
2 (f ) df
; 12
Z 1
2
Z 1
2
SjCC (0) = SjX (f )SjY (f ) df + SjXY
2 (f ) df
; 12 ; 12
Z 1
2
SjAB (0) = 2 SjXY (f )SjY X (f ) df
; 12
Z 1
2
SjAC (0) = 2 SjX (f )SjY X (f ) df and
; 12
Z 1
2
SjBC (0) = 2 SjY (f )SjY X (f ) df:
; 12

The gradient is given by

; 
g_ X2 (j ) Y2 (j ) XY (j ) =
" #T
; 2 pXY 2(j ) 2 ; pXY (j ) p 2 (1) 2 ( ) 
2 X (j ) X (j ) Y (j ) 2 Y (j ) X2 (j ) Y2 (j )
2
X j Y j

and, therefore, matrix multiplication of Equation (5.16) produces

XY
2 (j )
S XY
2 (j )
S XY
2 (j )
4 X6 (j ) Y2 (j ) jAA (0) + 2 X4 (j ) Y4 (j ) jAB (0) + 4 X2 (j ) Y6 (j ) SjBB(0)
+ 2 ( )1 2 ( ) SjCC (0) ; 4 (XY) (2j() ) SjAC (0) ; 2 (XY) (4j() ) SjBC (0):
X j Y j X j Y j X j Y j

Utilizing Parseval's relation, each auto and cross spectrum in SjABC (0) can be ap-
proximated by a sum of squared auto or cross-covariance sequences, respectively.
119

Hence, we may express Equation (5.16) as

1 NX
ej ;1 XY
2 (j ) XY
2 (j )
2 s 2 + 2CjXY CjY X
Nej  =;(Ne ;1) 4 X6 (j ) Y2 (j ) jX 2 X4 (j ) Y4 (j )
j

+ 4 2(XY
2 (j )
2s 2 + 1 ;s s + C 2 
X j ) Y (j )
6 jY X2 (j ) Y2 (j ) jX jY jXY

 XY (j )  XY (j )
; 4 ( ) 2 ( ) 2sjX CjY X ; 2 ( ) 4 ( ) 2sjY CjY X :
X j Y j X j Y j
Each of the autocovariance terms are equivalent to the wavelet autocovariance for
scale j (de ned by letting both wavelet coecients come from the same process in
Equation (5.1)) and each cross-covariance term is equivalent to the wavelet cross-
covariance for scale j . Using these quantities, Equation (5.16) may nally be ex-
pressed as a function of auto and cross-correlations based on the wavelet coecients
1 NX ;1
 ( ) ( ) + 2 ( )
ej

Nej  =;(Ne ;1)


X j Y j XY j
j

; 20XY (j )#X (j )Y X (j ) + Y (j )Y X (j )]

+ 20XY (j )# 21 2X (j ) + XY (j )Y X (j ) + 12 2Y (j )] 
which is (almost) equivalent to Equation (5.14), for large Nej .
2

5.4 Con
dence Intervals for the Wavelet Covariance and Correlation
5.4.1 Wavelet Covariance
We now discuss how to formulate con dence intervals for the estimators of the wavelet
covariance by making use of the large sample result in Equation (5.6). This was
previously given in Lindsay, Percival, and Rothrock (1996). We will use the peri-
odogram (Equation (B.3)) and the cross-periodogram (Equation (C.3)) to help es-
timate the quantities of interest. First, we simply use the periodogram SbjX(p) ( ) of
120

fjt(X ) t = Lj ; 1 : : :  N ; 1, as the estimator of SjX ( ), so that


W
 N ;1 2
(p) (f )  1  X f(X ) ;i2ft 
SbjX W e  
Nej l=L ;1 jlj

and similarly for SbjY


(p) ( ). Next, we de ne the biased estimator of the autocovariance

sequence associated with the scale j MODWT wavelet coecients of fXtg by


N ;X
1;j j
p)
s^(jX 1
e fjl(X )W
W fjl(X+)j j
Nj l=L ;1 j

with a similar de nition for fs^(jY


p)
g, the biased estimator of the autocovariance se-
quence associated with the scale j MODWT wavelet coecients of fYtg. Second,
we use the cross-periodogram SbjXY(p) of W fjt(X ) W
fjt(Y ) t = Lj ; 1 : : :  N ; 1 as the
estimator of SjXY ( ), so that
0 N ;1 1 0 N ;1 1
X X
SbjXY
(p)
(f )  e1 @ fjl(X )e;i2flA @
W fjl(Y )e;i2flA 
W
Nj l=L ;1 j l=L ;1 j

and the corresponding biased estimator of the cross covariance sequence associated
with the scale j MODWT wavelet coecients of fXt Ytg by
X f(X ) f(Y )
CbjXY
(p)  e1 W W 
Nj l jl jl+
where the summation goes from l = Lj ;1 to N ;1;
for
0 and from l = Lj ;1;

to N ; 1 for
< 0. Substituting the periodogram estimates of the autospectra and
cross spectrum into Equation (5.6) gives us an estimator Vej for the large sample
variance of the MODWT estimator of the wavelet covariance.
We can use Parseval's relation to obtain an alternative representation for Vej that
uses only the autocovariance and cross-covariance sequences instead of the autospec-
tra and cross spectrum. Speci cally, the integral of the product of the autospectra is
121

determined from the autocovariance sequences of fXtg and fYtg via


Z 1
X
Nej ;1
2
b
SjX b
(p) (f )S (p) (f ) df
jY = p) s^(p)
s^(jX jY
; 12
 =;(Nej ;1)
X
Nej ;1
= s^(jp0)X s^(jp0)Y +2 p) (p)
s^(jX s^jY 
 =1
and the integral of the product of the cross spectra from the cross covariance sequence
of fXt Ytg
Z h (p) i2 NeX
;1 h b(p) i2
bSjY X (f ) df =
1 j
2
CjXY :
; 12
 =;(Nej ;1)
We may now make an explicit de nition for the large sample variance of the MODWT
estimator of the wavelet covariance using the autocovariance and cross covariance se-
quences obtained from periodogram estimates of the autospectra and cross spectrum
i.e.,
N ;1 h i2
eVj  s^j0X s^j0Y + X s^(jX
p) s^(p) + 1 X
(p) (p) N ;1 ej ej

jY 2
b
C (p)
jXY : (5.17)
2  =1  =;(Ne ;1) j

Under the assumption that the spectral estimates are close to the true values, an
approximate 100(1 ; 2p)% con dence interval for XY (j ) is
" se se #
~XY (j ) ; *;1 (1 ; p) Vej  ~XY (j ) + *;1(1 ; p) Vej 
Nj Nj
where *;1 (p) is the p 100% percentage point for the standard normal distribution.
Replacing the MODWT wavelet coecients with their DWT counterparts, and ad-
justing for the number of wavelet coecients, will lead to an analogous con dence
interval for the DWT estimator of the wavelet covariance.

5.4.2 Wavelet Correlation


We now use the large sample theory developed in Section 5.3 to construct approximate
con dence intervals for the MODWT estimator of the wavelet correlation. Given
122

the non-normality of the correlation coecient for small sample sizes, a nonlinear
transformation is sometimes required { Fisher's z-transformation (Fisher 1915 Kotz,
Johnson, and Read 1982, Volume 3). Let

+  = tanh;1()
h()  21 log 11 ; 
de ne the transformation. For the estimated correlation coecient ^, based on n
p
independent samples, n ; 3(h(^) ; h()) has approximately a N (0 1) distribution.
p
The factor n ; 3 leads to a better approximation of the distribution (David 1966).
An approximate 100(1;2p)% con dence interval for XY (j ) based on the MODWT
is therefore
2 8 <
9
=
8
<
93
=
4 tanh :h# ~XY (j )] ; *q (1 ; p) 
;1
tanh :h# ~XY (j )] + *q (1 ; p) 5
;1

Nbj ; 3 Nbj ; 3
where Nbj is the number of DWT wavelet coecients associated with scale j . Note
that I am using the number of wavelet coecients as if I had computed the point
estimates using the DWT. This is done to provide a \better" estimate of the sample
size with respect to the number of approximately uncorrelated observations. The as-
sumption of uncorrelated observations is only valid if we believe no systematic trends
or nonstationary features exist at that scale. If an equivalent degrees of freedom
argument were available for the wavelet covariance, this could be utilized instead of
Nbj (cf. Section 7.5). The primary bene t here is, by utilizing the variance stabiliz-
ing transformation h( ), that we can avoid estimating the large sample variance for
~XY (j ) in Equation (5.14).

5.5 Comparison of Variance Estimators for the Wavelet Covariance


Looking back at Equation (5.4) we can see an alternative way to estimate the variance
of ~XY (j ) by estimating the autocovariance sequence of the product of the scale j
123

MODWT wavelet coecients for fXt g and fYtg i.e.,


1;j j 
N ;X f(X ) f(Y ) 
p)
s^(jXY 1
e f ( X ) f
W W ; E jXY Wjl+j jWjl+j j ; E jXY 
(Y )
Nj l=L ;1 jl jl
j

where
1 X;1 f(X ) f(Y )
N
E jXY  e W W :
Nj l=L ;1 jl jlj

Now de ne the alternative variance estimate to be


NeX;1 ! !
Ve j  1 ; je
j s^(jXY
j
p) (5.18)
 =;(Ne ;1) Nj
j

(cf. Equation 5.4), and the variance of ~XY (j ) can be approximated by Ve j =Nej . If we
are interested in the performance of one estimator versus the other in practice, then
we can look at the moment properties of these estimators. The following sections
investigate the bias and mean squared error of Ve j and Vej . For the bias, explicit
calculations are made and backed-up with simulation results. Due to the complexity
of calculating the mean squared error explicitly, only simulation results are provided.

5.5.1 First Moment Properties of Ve j


We start with the rst moment (bias) properties of the variance estimator given in
Equation (5.18). The expectation of Ve j , using Anderson (1971, p. 449), is given by
e  X !
Nej ;1 !
n p) o
E Vj = 1 ; je
j E s^(jXY
 =;(Nej ;1) Nj
X !
Nej ;1 !
= 1 ; j
j
 =;(Nej ;1) Nej
Z (1
f Nej ) sin( f (Nej ;
)) cos( f
)
cos(2 f
) ; 2 sin(
2
1
Nej sin( f (Nej ;
)) sin( f )
;2
! 9
sin( f ej ) !2=
N
+ e Sj(XY )(f ) df
Nj sin( f )
124

where Sj(XY )( ) is the spectral density of the product of the wavelet coecients
fjl(Y ). The bias for Ve j is therefore given by
fjl(X )W
W
e  e  e 
bias V j = E V j ; Vj = E V j ; Sj(XY )(0): (5.19)

At rst glance, it is dicult to determine the bias of Ve j simply from Equation (5.19).
A surprising result comes from Percival (1993), when we restrict ourselves to the
biased estimator of the acvs in practice i.e., before taking its expectation. Speci cally,
when the process mean is unknown, the biased estimator of the acvs fs^(jXY p)
g obeys
X
Nej ;1
p)
s^(jXY = 0:
 =;(Nej ;1)

So we have that, for large sample sizes, the quantity Ve j will be approximately zero
and the empirical bias will therefore be approximately equal to ;Vj . This fact is
con rmed, through Monte Carlo simulation, below.

5.5.2 First Moment Properties of Vej


To simplify the notation, let Xt  t = 1 : : :  N and Yt t = 1 : : :  N denote the wavelet
coecients of interest associated with level j . In order to examine the bias properties
of Vej , we will make an argument utilizing the equivalent degrees of freedom of spec-
tral estimators. Some preliminary results on second moment properties of spectral
estimators, from Priestley (1981, pp. 700{702), will be useful here speci cally
n b b o CW
Cov SX (f ) SY (f ) N SXY (f )SXY 
(f )
= CW jSXY (f )j2 f 6= 0 1=2 (5.20)
n o N
 
Var RbXY (f ) CNW 12 SX (f )SY (f ) + R2XY (f ) ; Q2XY (f )  f 6= 0 1=2
(5.21)
n b o CW 1  
Var QXY (f ) N 2 SX (f )SY (f ) + Q2XY (f ) ; R2XY (f )  f 6= 0 1=2
(5.22)
125

where SbXY ( ), RbXY ( ) and QbXY ( ) are estimates of the cross, co- and quadrature spec-
tra. That is, I can write SXY (f ) = RXY (f ) ; iQXY (f ) (cf. Section C.1). The fraction
CW =N involves the smoothing window Wm( ) applied to the spectral estimator. Using
Equation (B.5), we can re-express this fraction as
R f W 2 ()d
CW = = C2  
(N )
;f m
(N )
N N h
where Ch is based on the type of data taper used and  is the equivalent degrees
of freedom for the spectral estimator (cf. Section B.3). Assuming no tapering (i.e.,
Ch = 1) and utilizing Equations (5.20){(5.22), we can write
n o n o n o n o
E SbX (f )SbY (f ) = Cov SbX (f ) SbY (f ) + E SbX (f ) E SbY (f )
2jSXY (f )j + SX (f )SY (f ) f 6= 0 1=2
2

n o n o  n o2
E Rb2XY (f ) = Var RbXY (f ) + E RbXY (f )
SX (f )SY (f ) + RXY (f ) ; QXY (f ) + R2XY (f ) f 6= 0 1=2
2 2

n b2 o n b o  n b o2
E QXY (f ) = Var QXY (f ) + E QXY (f )
SX (f )SY (f ) + QXY (f ) ; RXY (f ) + Q2XY (f ) f =6 0 1=2:
2 2

Note, all the above quantities are real-valued random variables. The integrals of the
squared cross spectrum and magnitude squared cross spectrum can both be expressed
through their co- and quadrature spectra i.e.,
Z 1 Z 1 Z 1

jSXY j
2 2 2
(f ) 2 df = R2XY (f ) df + Q 2 (f ) df
XY
; 12 ; 12 ;2 1

and
Z 1 Z 1 Z 1 Z 1

; 2i RXY (f )QXY (f ) df ;
2 2 2 2
2 (f ) df
SXY = R2XY (f ) df Q2XY (f ) df:
; 12 ; 12 ; 12 ; 12

(5.23)
126

The cross-product term in Equation (5.23) disappears because


Z 1
2
RXY (f )QXY (f ) df
; 12
X
1 X
1 Z 1
2
= eXY oXY cos(2 f
"t) sin(2 f"t) df = 0
 =;1 =;1 ; 12

where
eXY  CXY +2 C;XY and oXY  CXY ;2 C;XY
are even and odd sequences, respectively, based on the cross-covariance sequence.
Therefore, Equation (5.23) reduces to
Z 1 Z 1 Z 1

;
2 2 2
SXY
2 (f ) df = R2XY (f ) df Q2XY (f ) df:
; 12 ; 21 ; 12

This gives an approximate expectation, since the frequencies f = 0 1=2 are included
in the integrals, of
ne o 1 Z 12 n o 1 Z 21 n o
E Vj = 2 1 E SbjX (f )SbjY (f ) df + 2 1 E SbjXY 2 (f ) df

Z Z
;2 ;2

1 1
2 n b b o 1 21 n b2 o
= 2 1 E SjX (f )SjY (f ) df + 2 1 E RjXY (f ) df
Z
;2 ;2
n o
; 21 1 E Qb2jXY (f ) df
1
2

Z j
;2
  Z 12
1 2 S jXY (f )j2 1 1
1

2 1
2
+ SjX (f )SjY (f ) df +  + 2 2 (f ) df
SjXY
;2  ;21

When the magnitude squared coherence between the two processes is unity, then
SjX (f )SjY (f ) = jSjXY (f )j2 and therefore
ne o Z Z
E Vj 1 + 12 1+1
1 1
2 2
S ( f ) S ( f ) df + S 2 (f ) df: (5.24)
jX jY jXY
; 12  2 1
;2

Hence, Vej is an unbiased estimator of Vj when the periodogram is used to estimate


the spectra i.e., when  = 2 as de ned in Equation (5.17). If Xt = Yt, then the
127

MODWT estimator of the wavelet covariance ~XY (j ) is equivalent to the MODWT
estimator of the wavelet variance ~X2 (j ) and Equation (5.24) reduces to the quantity
AW , where AW =Nej is the large sample variance of ~X2 (j ) in Percival (1995).
j j

5.5.3 Empirical Results


The rst moment properties of Vej and Ve j were nontrivial to obtain. If we are to
investigate the second moment properties of these estimators, then we are dealing
with even higher moments and, hence, more complicated expressions. Instead of
solving this analytically, we will instead refer to empirical results. For the estimator
Vej , let us de ne the empirical bias
d ne o 1 X M  
bias Vj  M Vej ; Vj 
m=1
and the empirical mean squared error
ne o 1 X M  2
dse Vj  M
m Vej ; Vj 
m=1
for a given number of iterations M . Analogous estimates may be de ned for the
alternative estimator Ve j .
The empirical bias and mean squared error for Vej when analyzing two uncorrelated
white noise processes fXt g and fYtg, with variances X2 = Y2 = 1, are given in
Table 5.2. This reduces the relevant quantities to something similar to the case of the
wavelet variance (if X2 = Y2 , as in this case, then it is exactly the wavelet variance).
We see that the estimates are quite close to their theoretical values with the Haar
and D(4) wavelet lters performing similarly. In order to compute Vj , we make use
of the lack of correlation between these processes to reduce Equation (5.6) to
Z1 Z1
2
1
He2j (f )SjX (f )SjY (f ) df = 2 2
X Y
2
1
He2j (f ) df
;2 ;2

which depends upon the squared gain function of the wavelet lter for scale j . While
the squared gain function for the rst scale is easy enough to compute analytically,
128

Table 5.2: Empirical bias and mean squared error (mse) of Vej  j = 1 : : :  6, for
uncorrelated white noise processes (N = 512), based on M = 500 realizations.

Haar D(4)
Level Vj(Haar) ave. bd
ias dse
m Vj(D4) ave. bd
ias dse
m
1 0:3750 0:3763 0:00127 0:00317 0:4102 0:4063 0:00383 0:00336
2 0:1094 0:1086 ;0:00076 0:00318 0:1315 0:1294 ;0:00210 0:00056
3 0:0449 0:0443 ;0:00064 0:00010 0:0620 0:0613 ;0:00067 0:00024
4 0:0212 0:0210 ;0:00023 0:00005 0:0308 0:0309 0:00009 0:00015
5 0:0105 0:0105 0:00000 0:00002 0:0154 0:0144 ;0:00100 0:00005
6 0:0052 0:0049 ;0:00033 0:00001 0:0077 0:0067 ;0:00104 0:00003

for higher scales this integral was evaluated through numeric integration (Press et al.
1992, Ch. 4). From the table, we see that the estimates have, on average, negligible
bias and very small mean squared error for either wavelet lter.
Figure 5.1 displays the distributions of the estimated variances Vej  j = 1 : : :  6,
between these uncorrelated white noise processes (N = 512). The estimates appear to
be distributed symmetrically about their true value at all scales. There appears to be
a slight increase in variability when using the D(4) wavelet lter over the Haar across
all scales. Also shown are the distributions of the estimated variances Ve j  j = 1 : : :  6.
These estimates have skewed distributions and are negatively biased across all scales.
As previously stated, the constraint on the acvs with unknown mean appears to be
forcing the estimates towards ;Vj .
One of the simplest relationships between bivariate time series is linear regression
with delay see, e.g., Priestley (1981, pp. 663{664). If we have two time series fXt g
129

-0.4 -0.2 0.0 0.2 0.4 0.6

Haar Haar
V~ V~~
Level 6 .. ..
Level 5 .. ...
Level 4 .. ....
Level 3 .... .............
Level 2 .... .....
...
.........
..
Level 1 ... .......... .. . . . . .. . .
D(4) D(4)
V~ V~~
Level 6 ... ..
Level 5 .... ......
Level 4 ...... .....
..... .
Level 3 .... .....
...
...
........ .
Level 2 ..... . ......
....
... . ..
Level 1 . .... ........... . ...... . .

-0.4 -0.2 0.0 0.2 0.4 0.6

Estimate - True Value

Figure 5.1: Estimates of Vej (left column) and Ve j (right column) j = 1 : : :  6, for
uncorrelated white noise processes (N = 512), based on M = 500 iterations.

and fYtg with autospectra SX ( ) and SY ( ), respectively, that are related via

Yt = cXt;d + t
and Section A.2) we know their spectra are related via
SY (f ) = c2SX (f ) + 2"t: (5.25)
130

Their cross spectrum is given by SXY (f ) = ce;i2fd tSX (f ), with co-spectrum tak-
ing the form RXY (f ) = c cos(2 fd"t)SX (f ) and quadrature spectrum QXY (f ) =
c sin(2 fd"t)SX (f ). To simplify these expressions, assume fXtg is a white noise
process with X2 = 1, let c = 1 and 2 = 0. Then Equation (5.6) reduces to
Z 1 Z1
c2 X4
2
1 He2j (f ) df + c2 4
x
2
1 He2j (f )e;4fd df
;2 ;2

and can be evaluated via numeric integration as was the case with uncorrelated
processes.

Table 5.3: Empirical bias and mean squared error (mse) of Vej  j = 1 : : :  6, for white
noise processes which are related via linear regression with delay (N = 512), based
on M = 500 iterations.

Haar D(4)
Level Vj(Haar) ave. bd
ias dse
m Vj(D4) ave. bd
ias dse
m
1 0:7500 0:7430 ;0:00697 0:01523 0:8203 0:8132 ;0:00717 0:01858
2 0:2188 0:2180 ;0:00072 0:00200 0:2630 0:2612 ;0:00105 0:00369
3 0:0898 0:0890 ;0:00084 0:00060 0:1240 0:1224 ;0:00159 0:00147
4 0:0425 0:0422 ;0:00029 0:00030 0:0617 0:0611 ;0:00057 0:00085
5 0:0209 0:0195 ;0:00141 0:00014 0:0308 0:0282 ;0:00266 0:00044
6 0:0104 0:0096 ;0:00083 0:00005 0:0154 0:0129 ;0:00250 0:00017

Table 5.3 gives the empirical bias and mean squared error for a simulation study
based on two processes related via Equation (5.25). Again, we see a slightly larger
bias and mean squared error when using the D(4) wavelet lter.
The distributions of the estimated variances Vej  j = 1 : : :  6, between processes
which are related via linear regression with delay are given in Figure 5.2. The es-
131

-0.5 0.0 0.5 1.0

Haar Haar
V~ V~~
Level 6 ... ...
Level 5 .... ......
Level 4 .... ....
....
Level 3 .... .......... ..
Level 2 .... ..
......... .. .
Level 1 .. . . ...
...... ...... . . ... .
D(4) D(4)
V~ V~~
Level 6 ...... ....
Level 5 ...
.. . . ...... .
Level 4 ..... . ......... ..
Level 3 ...... .......
......
Level 2 ...
...
.. ...
....
......... . . .
Level 1 . .... . ...
........ .. .. .... .

-0.5 0.0 0.5 1.0

Estimate - True Value

Figure 5.2: Estimates of Vej (left column) and Ve j (right column) j = 1 : : :  6, minus
their true value, for processes which satisfy a linear regression with delay relationship
(N = 512), based on M = 500 iterations.

timates appear to be distributed symmetrically about their true value at all scales.
There is a slight increase in variability when using the D(4) wavelet lter over the
Haar across scales. Also shown are the distributions of the estimated variances
Ve j  j = 1 : : :  6. Again, these distributions are negatively biased about their true
132

value.

5.5.4 Conclusions
We have compared two potential estimators, Vej and Ve j , for the variance of the wavelet
covariance. The former is based on using periodogram-based estimates of the integrals
in Equation (5.6), while the latter uses an estimate of the autocovariance sequence in
Equation (5.4). The variance estimate Vej , de ned in Equation (5.17), is an unbiased
estimator of Vj and has negligible mean squared error when considering uncorre-
lated or linear regression with delay processes. The alternative variance estimate
Ve j , de ned in Equation (5.18), is a negatively biased estimator of Vj with the bias
approaching ;Vj for large Nej .
Chapter 6
APPLICATIONS
In this chapter we apply the various techniques previously introduced, such as
testing homogeneity of variance in time series and analyzing bivariate time series
using wavelet estimators, to real data.
The Nile River minimum water levels (Toussoun 1925) is a time series of yearly
measurements starting in 622 AD and continuing, with both large and small gaps of
missing values, into the twentieth century. We analyze the rst continuous piece of
the series from 622 AD to 1284 AD. A key feature of the series is a marked increase
in variability during the rst century of measurements (Beran 1994, Sec. 10.3). We
compare results from our wavelet analysis to those of Beran and Terrin (1996), where
they utilized a test statistic to detect a change in the long memory parameter in the
time series. We nd a signi cant change of variance around 720 AD which coincides
with the construction of an instrument, called a nilometer, in 715 AD.
A time series of vertical ocean shear measurements (Percival and Guttorp 1994),
where the observations are based on depth not time, is analyzed to detect multiple
variance changes in the series. Two bursts of increased variability occur towards the
beginning and the end of the series. When comparing the series to 4096 observations
in the middle of the series (used in Percival and Guttorp (1994)), there is increased
variability in the rst 5 scales only. Applying the multiple variance change detection
procedure (Section 4.6) to this series yields a variety of signi cant variance changes
in the rst ve scales. The two obvious bursts of variability (at 450 m and 1000 m)
are adequately identi ed, and a third burst around 800m appears in the rst four
scales.
134

The Madden{Julian oscillation (MJO) (Madden and Julian 1971) was originally
discovered using bivariate spectral analysis i.e., the lag window estimators of co-
spectrum and magnitude squared coherence. Since then it has been identi ed and
described by researchers in a variety of physical disciplines see Madden and Julian
(1994) for a review. I reanalyze the data used by Madden and Julian (1971) using
multitaper spectral techniques and, more importantly, bivariate wavelet techniques
developed in Chapter 5. The multitaper estimates of the co-spectrum and magnitude
squared coherence show a much more narrow period for the MJO, primarily because
the amount of smoothing has been drastically reduced with respect to corresponding
lag window estimates. The estimated wavelet correlation and cross-correlation agree
with the original ndings. A peak in the estimated correlation occurs in the fth scale,
corresponding to changes of 16 days and frequencies 1=64  f  1=32 cycles per day,
between station pressure and 850 mb zonal wind, and between 150 mb and 850 mb
zonal winds with a small lead/lag relationship between the atmospheric variables.
While analyzing atmospheric time series collected at Canton Island (Madden and
Julian 1971), we provided an empirical validation of the bivariate wavelet techniques,
but did not take advantage of the time-localization properties which the wavelet trans-
form possesses. With this in mind, we turn our attention to investigating the possible
interaction between El Ni~no{Southern Oscillation (ENSO) events with the MJO. Us-
ing daily station pressure readings from Darwin, Australia, and Tahiti, French Poly-
nesia, we construct a (daily) Southern Oscillation Index from roughly 1957 to 1992
by simply dierencing the observations at these two stations { this is a measure of
ENSO activity. Similar readings from Truk Island are used as a proxy for the MJO.
A bivariate wavelet analysis is performed between these two atmospheric time series.
We nd a large peak in the fth scale of the wavelet correlation corresponding to the
MJO. The wavelet cross-correlation nicely \decomposes" the usual cross-correlation
into a few distinct patterns. The time-varying structure of the wavelet variance and
covariance is also qualitatively analyzed by partitioning them by season and ENSO
135

activity.

6.1 Nile River Minimum Water Levels


6.1.1 Introduction
\In spite of all the changing, uncertain, and erroneous factors that
must be considered in connection with records of stages of the Nile River,
it is believed that they disclose some important information and there is
a fair prospect that they may yield more data with further study and the
cumulation of ideas for various students."
The words of Jarvis (1936) are very prophetic, for in fact data collected from the
Nile River have spurred the development of a whole eld of mathematics (i.e., frac-
tional Brownian motion and fractional Gaussian noise) along with a eld of statistics
concerned with the behavior of long memory time series. Gathered by Toussoun
(1925), there exists a remarkable hydrological time series of minimum and maximum
water levels for the Nile River. Starting in 622 AD, the rst missing observation in
the annual minima occurs in 1285 AD. This leaves us with a complete record for 663
years to analyze, shown in Figure 6.1.
A reasonable amount of literature has been written about Toussoun's Nile River
data. Some notable facts are given here. The minimum water levels for the Nile River
are not actually the yearly minima. These values were recorded around the end of
June each year whereas the maximum water levels were the actual yearly maxima
(Popper 1951 Verner 1972 Leftus 1986) even though Brooks (1949, p. 329) notes
that the lowest levels of the Nile occur in April and May with erratic behavior in
June and the beginning of July.
Various time domain and spectral domain methods have been used to analyze
these data. Given the current state of knowledge about this process and its apparent
long memory structure, these past results will largely be ignored. Statistical modeling
136
Minimum Water Level (cm)

1400

1300

1200

1100

1000

600 800 1000 1200

Year

Figure 6.1: Nile River minimum water levels for 622 AD to 1284 AD. These data
can be obtained via the World Wide Web at http://lib.stat.cmu.edu/S/ under
the title `beran'. This is the address for StatLib, a statistical archive maintained by
Carnegie{Mellon University.

of this time series as a long memory process began with the doctoral works of Mohr
(1981) and Graf (1983). Both used Fourier transform (periodogram) analysis for
estimating the self-similarity parameter of a fractional Gaussian noise model. Graf
(1983) reported estimates of H = d + 12 between 0.83 and 0.85.

Beran (1994, p. 118) has reported estimates of H = 0:84 for fractional Gaussian
noise and H = 0:90 for a fractional ARIMA model with 95% con dence intervals of
(0:79 0:89) and (0:84 0:96), respectively. He also established a goodness-of- t test
for the spectral density of a long memory process. An approximate p-value for the
fractional Gaussian noise model of the yearly minimum water levels of the Nile River
is 0.70 { meaning that fractional Gaussian noise appears to t the spectral density of
the Nile River series well.
137

Data

D1

D2

D3

D4

S4

600 700 800 900 1000 1100 1200 1300


Years

Figure 6.2: Multiresolution analysis of the Nile River minimum water levels using
the D(4) wavelet lter and MODWT. The top plot of the gure is the series itself,
while the ve time series plotted below it constitute an additive decomposition of the
series into components associated with { from top to bottom { variations on scales of
1 year (De1), 2 years (De2), 4 years (De3), 8 years (De4) and 16 years or longer (Se4). The
vertical dotted line splits the series into two parts: the rst 100 observations (from
622 to 721 AD) and the remaining 563 observations (722 to 1284 AD).
138

6.1.2 Wavelet Analysis


Figure 6.2 shows a wavelet-based multiresolution analysis of Nile River minimum
water levels. The subseries Dej and SeJ in a multiresolution analysis form an additive
decomposition of the original time series
X
J
Yt = e Djt + SeJt :
j =1

Each subseries Dej is associated with changes at scale j = 2j;1 , while SeJ is associated
with weighted averages over scales of 2J  see Percival and Mofjeld (1997) for more
details. We used the D(4) wavelet in conjunction with the MODWT, extended to
N coecients at each scale by assuming periodic boundary conditions. Visually it
appears that there is greater variability in changes on scales of 1 and 2 years prior to
722 AD, but not on longer scales. Beran (1994, Sec. 10.3) investigated the question
of a change in the long memory parameter in this time series by partitioning the
rst 600 observations into two subseries containing, respectively, the rst 100 and
the remaining 500 measurements. Estimates of the long memory parameter d, using
maximum likelihood, were quite dierent between the two subseries, 0.04 and 0.38
respectively. This analysis suggests a change in d, a conclusion that was also drawn in
Beran and Terrin (1996) using a procedure designed to test for a change in the long
memory parameter. We can perform a similar analysis using the wavelet variance
Y2 (j ), which makes use of the DWT or MODWT to decompose the variance of
fYtg on a scale by scale basis (cf. Section 3.4). The estimated MODWT wavelet
variances, given a partitioning scheme similar to the one used by Beran, are displayed
in Figure 6.3. We see that the 95% con dence intervals for scales of 1 and 2 years do
not overlap, which agrees with the apparent change of variance for those same scales
in Figure 6.2.
For a fractional dierence process we have Y2 (j ) / 2j d;1 approximately, so we
can estimate d by regressing log ~Y2 (j ) on log j and using the estimated slope ^
139

10000
5000

-
Wavelet Variance

- -
1000

-
- -
-
-
500

-
100

622 - 721 AD
722 - 1284 AD
50

1 2 4 8 16 32 64

Scale (years)

Figure 6.3: Estimated D(4) wavelet variances for the Nile River minimum water levels
before and after the year 722 AD, along with 95% con dence intervals based upon a
chi-square approximation given in Percival (1995).

to form d^ = 12 (^ + 1) (Percival and Walden 1999, Sec. 8.1). This procedure yields
estimates of d^ = 0:38, 0.42 and ;0:07 for, respectively, the whole time series, the
last 563 observations and the rst 100 observations. These compare favorably with
Beran's values of 0.40, 0.38 and 0.04, but it is clear from Figure 6.3 that the smaller
value for d^ in the rst 100 years is due to increased variability at scales of 2 years or
140

less. The observed dierence in variability at longer scales between the rst and last
portions of the time series is consistent with sampling variability.

6.1.3 Testing for Homogeneity of Variance


Let us now apply the methodology developed in Chapter 4 to the Nile River minima.
Using all N = 663 values in the time series, we computed our test statistic D (cf.
Section 4.2.1) for scales of 1, 2, 4 and 8 years (j = 1 2 3 4) based, respectively, on
331, 115, 57 and 28 wavelet coecients. The results from the test, shown in Table 6.1,
con rm the visual appearance of inhomogeneity of variance at scales of 1 and 2 years,
but fail to reject the null hypothesis of variance homogeneity at scales of 4 and 8 years.

Table 6.1: Results of testing the Nile River minimum water levels for homogeneity
of variance (N = 663) using the Haar wavelet lter with Monte Carlo critical values.
As shown in the table, the test statistic at scale 1 is signi cant at the 1% level, and
the test statistic at scale 2 is signi cant at the 5% level.
Scale D 10% critical level 5% critical level 1% critical level
1 0:1559 0:0945 0:1051 0:1262
2 0:1754 0:1320 0:1469 0:1765
4 0:1000 0:1855 0:2068 0:2474
8 0:2313 0:2572 0:2864 0:3436

With a change of variance detected in the rst and second scales, we can apply
the methodology from Section 4.5 to locate these change points. Figure 6.4 displays
the normalized cumulative sum of squares as a function of wavelet coecient for the
rst two scales. We see a sudden accumulation of variance in the rst 100 years and
a gradual tapering o of the variance afterwards (by construction the series must
begin and end at zero). The maximum is actually attained in 720 AD for the level 1
141

coecients and 722 AD for level 2. The subsequent smaller peaks occurring in the
ninth century are associated with large observations, as seen in the original series,
not changes in the variance of the time series.

Data
1400
1200
1000

D2
0.15
0.10
0.05
0.0

D1
0.15
0.10
0.05
0.0

600 700 800 900 1000 1100 1200 1300

Years

Figure 6.4: Normalized cumulative sum of squares from the rst two scales of the
MODWT for the Nile River minimum water levels. The vertical dotted line is at
715 AD.

The source document for this series (Toussoun 1925) and subsequent historical
142

studies by Popper (1951) and Balek (1977, Ch. 1) all indicate the construction in
715 AD of a \nilometer" in a mosque on Roda Island in the Nile River near Cairo.
The yearly minimum water levels for 715 AD to 1284 AD were measured using this
device, or a reconstruction of it done in 861 AD. The precise source of measurements
for 622 AD to 714 AD is unknown, but they were most likely made at dierent
locations around Cairo, with possibly dierent types of measurement devices, of less
accuracy than the one in the Roda Island mosque. Our estimated change point at
720 or 722 AD coincides well with the construction of this new instrument in 715 AD,
and it is reasonable that this new nilometer led to a reduction in variability at the
very smallest scales.
Beran and Terrin (1996) had looked at the Nile River minimum water levels and
used a test statistic to argue for a change in the long memory parameter in the
time series. The results from our analysis, in conjunction with an examination of
the historical record, suggest an alternative interpretation. There is a decrease in
variability at scales of 2 years and less after about 720 AD and that this decrease
is due to a new measurement instrument, rather than to a change in the long term
characteristics of the Nile River.

6.2 Vertical Ocean Shear Measurements


Percival and Guttorp (1994) analyzed a set of vertical ocean shear measurements.
The data were collected by dropping a probe into the ocean which records the water
velocity every 0.1 meter as it descends. Hence, the \time" index is really depth (in
meters). The shear measurements (in s;1) are obtained by taking a rst dierence
of the velocity readings over 10 meter intervals and applying a low-pass lter to the
dierence readings.
Figure 6.5 shows all 6875 observations available for analysis. We see two sections
of greater variability, one around 450 m and the other around 1000 m, with a fairly
143

0
1/s

-2

-4

-6

400 600 800 1000

meters

Figure 6.5: Plot of vertical shear measurements (inverse seconds) versus depth (me-
ters). The two vertical lines are at 489.5 m and 899.0 m, and denote the roughly
stationary series used by Percival and Guttorp (1994). This series can be obtained
via the World Wide Web at http://lib.stat.cmu.edu/datasets/ under the title
`lmpavw'.

stationary section in between. Percival and Guttorp (1994) commented on this fact
and only looked at 4096 observations ranging from 489.5 m to 899.0 m in their paper.
Wang, Cavanaugh, and Song (1997) analyzed the full time series in order to estimate
a time-varying self-similarity parameter using the DWT. We propose to apply the
methodology for detecting and locating multiple variance changes (cf. Section 4.6)
to this geophysical series.
Figure 6.6 gives a multiresolution analysis of the ocean shear time series using
the D(4) wavelet. The eight time series plotted constitute a portion of an additive
decomposition of the series into components associated with { from top to bottom {
variations on scales of 0.1 meters (De1), 0.2 meters (De2), 0.4 meters (De3), 0.8 meters
(De4 ), 1.6 meters (De5), 3.2 meters (De6), 6.4 meters (De7 ) and 12.8 meters (De8). We
see a persistence of the increased variability in the rst 5 scales around 1000m, and
144

D1

D2

D3

D4

D5

D6

D7

D8

400 500 600 700 800 900 1000


Depth (meters)

Figure 6.6: Multiresolution analysis of the vertical ocean shear measurements using
the D(4) wavelet lter and maximal overlap discrete wavelet transform. The rst
eight details De1{De8 are displayed with each series on the same vertical scale. The
two vertical lines are at 489.5 m and 899.0 m, and denote the wavelet coecients
used by Percival and Guttorp (1994).
145

to a lesser extent, around 450m.

1.000

--
-
-
-- -
0.100

--
wavelet variance

-
-
0.010

- -

-
0.001

N = 6875
-
- N = 4096
-

1 2 4 8 16 32 64 128 256 512


scale (0.1 meters)

Figure 6.7: Estimated wavelet variance of the vertical ocean shear measurements
using the D(4) wavelet lter and MODWT. The light grey con dence intervals cor-
respond to all 6875 observations, while the dark grey con dence intervals correspond
to the middle 4096 observations as analyzed in Percival and Guttorp (1994).

The inuence of the ends of the time series (i.e., the observations outside the ver-
tical dotted lines in Figure 6.5) is most evident when comparing its wavelet variance
to the wavelet variance between the middle 4096 observations see Figure 6.7. The
146

bursts of increased variability observed in the rst 5 scales make a signi cant con-
tribution to the wavelet variance. For those scales, the con dence intervals do not
overlap between the full and truncated time series, whereas the con dence intervals
do overlap for all subsequent scales. As with the Nile River minimum water levels,
this feature hints at a possible heterogeneity of variance in the rst 5 scales.
Classifying these data as having long-range dependence is not obvious. The roll-
o of the wavelet variance at the higher scales (lower frequencies) does not t with the
general framework of a fractional dierence process. Wang et al. (1997) estimated
a time-varying long memory parameter for these measurements. The middle of the
series has a roughly constant long memory parameter between 0.65 and 0.70, while
the ends of the series exhibit much greater long memory parameters. I will not
concentrate on modeling this process as a globally or locally self-similar process, but
instead investigate the nonstationary features through testing for homogeneity of
variance on a scale by scale basis.
Figure 6.8 shows the MODWT wavelet coecients for the rst ve scales of the
vertical ocean shear measurements. The vertical dotted lines are the estimated loca-
tions of variance change points using the DWT to test and the MODWT to locate
with asymptotic critical values ( = 0:05). The procedure does a good job of isolating
the two regions of increased variability at 450 m and 1000 m in each scale, except
for the second scale. There, the rst burst has been \picked apart" by the procedure
with 10 distinct stationary regions. This does not seem appropriate and it is unclear
why this only occurred on the second scale when the third scale appears to be sim-
ilar in changing variability with time. Besides the two obvious regions of increased
variability, there appears to be a third burst around 800 m. It is present, to diering
degrees, in the rst four scales whereas most other bursts disappear after the rst
and second scale. This is a much more subtle type of nonstationarity, compared to
the obvious bursts at 450 m and 1000 m, and not particularly visible in the original
time series with the naked eye.
147

Level: 5

0.5
0.0
-0.5
-1.0

Level: 4
0.5
0.0
-0.5
0.6 -1.0

Level: 3
0.4
0.2
1/s
0.0
-0.4

Level: 2
0.4
0.2
0.0
-0.2
-0.4

Level: 1
0.3
0.2
0.1
-0.1
-0.3

400 600 800 1000

Depth (meters)

Figure 6.8: Estimated locations of variance change for the vertical ocean shear mea-
surements using the D(4) wavelet lter displayed on the MODWT wavelet coecients.
Only the rst ve scales were found to have signi cant changes of variance. Asymp-
totic critical values were used for the hypothesis testing at the  = 0:05 level of
signi cance.
148

This algorithm for detecting and locating multiple variance changes via the DWT
is in its infancy. More work is needed in order to re ne the procedure and investigate
its properties. Given the ability of the DWT to remove heavy amounts of autocorre-
lation in time series, this method has wide application in many elds. Whereas this
test can handle high amounts of autocorrelation, as found in stationary long memory
processes, the advantage of this procedure is that only limited assumptions are made
with respect to the underlying spectrum of the observed physical process.

6.3 Wavelet and Multitaper Spectral Analysis of the Madden{Julian


Oscillation
6.3.1 Introduction
The Madden{Julian oscillation (MJO) (Madden and Julian 1971) is a particular
atmospheric phenomenon which has been discovered in a variety of studies involving
data from the tropical Paci c Ocean from 1971 to the present see Madden and Julian
(1994) for a review. In their rst paper, the authors utilized bivariate spectral time
series analysis in order to detect the oscillation. Speci c atmospheric variables used
were the station pressure, 850 mb wind speed, and 150 mb wind speed. We propose a
univariate and bivariate spectral analysis using multitaper methods (Thomson 1982
Percival and Walden 1993, Ch. 7) and then a bivariate wavelet analysis on the same
time series originally used from Canton Island (2.8 S, 171.7 W) see Figure 6.9 for
the location of Canton Island in the Paci c Ocean.
The data, obtained from NCAR (the National Center for Atmospheric Research),
consists of three time series measured at the same location from 1 June 1957 to
31 March 1967. The atmospheric variables of interest are station pressure, wind
speed at 150 mb and wind speed at 850 mb. Measurements were taken at 0000 GMT
and 1200 GMT. As in the original paper, only the 0000 GMT observations are used.
Hence, we can regard the sampling interval of the series "t = 1 day. This gives us
149


Truk


Canton

Darwin

Tahiti

Figure 6.9: Climate stations in the tropical Paci c Ocean. The horizontal line is the
equator, plotted for reference. The horizontal range is roughly from 110 E to 140 W
and the vertical range is 45 .

a Nyquist frequency of f(N )  1=(2"t) = 1=2 cycles/day. For days with no mea-
surements recorded, an ARIMA (3,1,0) model was t and one step ahead predictions
were used to ll-in the gaps (Jones 1980). The majority of missing values were iso-
lated observations, except for a week of missing data between 5 January 1965 and
10 January 1965. This gives us three length 3591 time series, which are shown in
Figure 6.10. The ragged look of each series is because no decimal places were kept
for any measurement.
We rst analyze each time series separately, starting with the periodogram and
150

Station Pressure
1015
1010
1005
1000

850 mb Wind
20
15
10
5
0

150 mb Wind
30
20
10
0

1958 1960 1962 1964 1966

time

Figure 6.10: Atmospheric time series collected from Canton Island (2:8 S, 171:7 W)
over the period 1 June 1957 to 31 March 1967. From top to bottom, they are station
pressure (in hPa), wind speed at 850 mb and wind speed at 150 mb (both in km/h).
151

then apply a variety of techniques to investigate any potential sources of bias. Con-
clusions drawn here are compared with those found in Madden and Julian (1971).
The lag window spectral estimates utilized in the original paper are reproduced, as
best as possible, and compared with multitaper spectral estimates. After looking at
the three series independently, we perform a bivariate spectral analysis between the
three possible pairings of the time series. Lag window estimates of the co-spectrum
and magnitude squared coherence are contrasted with their corresponding multitaper
estimates. A bivariate wavelet analysis is then performed in order to see how these
new techniques compare to classical spectral analysis.

6.3.2 Univariate Spectral Analysis


We begin with a univariate spectral analysis of the station pressure, 150 mb wind
speed, and 850 mb wind speed series. The periodogram for each showed no obvious
signs of leakage i.e., the transfer of power from one region of the spectrum to another.
Common indications include a change in the variability of the periodogram for speci c
ranges of frequencies. An approximate dynamic range for the all three series is around
30{35 dB. When applying a 10% cosine taper to these data, there are no obvious
dierences between the periodogram and direct spectral estimator. A discrete prolate
spheroidal sequence (dpss) data taper (cf. Section B.2), with NW = 4, was applied
to the three series with no apparent changes in station pressure or 850 mb wind speed,
but a marked accentuation of shape is seen for 150 mb wind speed. Hence, it appears
that little or no tapering is required for the station pressure and 850 mb wind series,
but caution should be exercised when analyzing the 150 mb wind speed.
Madden and Julian (1971) performed the following algorithm to estimate the
univariate spectra:

\The algorithm makes use of the modi ed Fourier periodogram obtained


by 1) removing the sample mean of the N members of the time series, 2)
152

`tapering' the rst and last 10% of the resulting N members by multipli-
cation by a segment of the cosine curve so that the ends of the series are
zero, and 3) performing the fFt to obtain N=2 harmonic coecients. The
squared amplitudes or modi ed periodogram estimates are then averaged
by a running average of length L coecients this averaging producing
an estimate of the continuous spectra viewed through a rectangular spec-
tral window of bandwidth equal to (2L=N )fN where fN is the Nyquist
y y

frequency."
and reference Bingham et al. (1967) for their Fourier methodology.
In order to reproduce the results of Madden and Julian (1971), some preliminary
interpretations and calculations must be performed. From the quotation given above,
they appear to have utilized a lag window spectral estimate using the Daniell smooth-
ing window. From p. 703, \The value of L was chosen so that the bandwidth of the
spectral window was 0.0081 day;1." Using the formula from Table 269 in Percival
and Walden (1993), we can compute the window parameter
m = B 1"t = 0:00811 123:
W
For the Daniell smoothing window, the parameter m controls the amount of averaging
across frequencies of the spectral estimate { the smaller the m the more averaging
occurs. Upon comparing our lag window spectral estimators with those in the original
work, they do not appear to have the same degree of smoothness. Two possible
explanations for this dierence are they didn't use lag window spectral estimators
as de ned in Percival and Walden (1993) or the lag window spectral estimates were
smoothed again, possibly by splines, before publication. Regardless, we can obtain
a reasonably smooth spectral estimator by replacing the Daniell smoothing window
with the Parzen smoothing window. A recalculation of m 228 for the Parzen
smoothing window is required, where m is now a truncation point where the acvs is
zero for lags greater than m. The left column of plots in Figure 6.11 show these lag
153

window spectral estimates. The station pressure spectrum is very close to the one
displayed in Madden and Julian (1971).
As an alternative to lag window spectral estimation, we apply multitaper spec-
tral estimation (Thomson 1982) to these series. Several dpss data tapers, which are
orthogonal and normalized to have unit energy, are applied to the time series. The
modulus squared Fourier transforms of these tapered series (also known as eigenspec-
tra), are then averaged across frequencies. The right column of plots in Figure 6.11
show these multitaper spectral estimates. Although these spectral estimates are much
less ragged than the periodogram, they are far from the smoothness of a lag window
spectral estimate. We see an \annual" peak near zero frequency in the three multi-
taper spectra, with the peak being the largest in the 150 mb wind speed series. This
is most likely due to its relatively at background spectrum. Notice, the multitaper
spectral estimate for the 150 mb wind speed series agrees with the periodogram in
shape. This is contrary to the result using a direct spectral estimator (dpss, NW = 4)
stated at the beginning of this section.
There is an issue in how to handle the annual cycle in these data. From Madden
and Julian (1971, p. 703),

\The particular algorithm used to estimate the spectra included an ad-


justment to eliminate the annual period and higher harmonics thereof
from the series. This was accomplished by substituting the average of
four adjacent modi ed periodogram estimates for the estimates at those
frequencies nearest to the annual and semiannual frequencies. The object
of this substitution was to insert values on the order of the background
continuous spectrum for those estimates inuenced by the annual compo-
nent."

Given the degree of smoothing involved in the lag window spectral estimates, this
does not appear to be necessary. The plots in the left column of Figure 6.11 were not
154

SLP SLP
30

30
.
25

25
.
20

20
.
.
. . . . .
.. .. . ... .
dB

dB
15

15
.. . . .
. . . .. . .. . . . . . . . . . . . .
. . . .. . . .
. . .. . . . . . . .. . . . . . . .. . . . .
. . .
.
10

10
. . . . .. .... . ...
. . .. . . . . . . . . . .. . .
. .
.
. . . . . . . .. . . . . . . . .
.
.. . . . . . . . .. .
. .
. .. . . . . . .. . . . . .. .. .. ... .
.. . . . . . . . . . . . .. . . .
5

5
. .. . .
.
. . .. . . . .. . . . .. .
. . ..
. . . ...
.. . .. . .
..
.. . . . .
. . . . .. .. . .
. .. . . . . ..
. .
0

0
. . . . .
. . . . . . . .
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) frequency (cycles/day)

W150 W150
30

30
..

. . . . .. .
25

25

.. . . .
. . .. . . .. . . . . .
. . . .. . .
.
. . .. ...
. . . . . .
. ... . . ... . ... . .. .. .
. . . .
. . . . . . . . . . . . .. . .. . .. . . . . .. . . . .. .
20

20

. . .
.. .
. . . .. . .
. . .... . .. . . . . . .. . .. .
. . .
. . ... . . . . . . . .... . . . . . . .. .
. . . . .. .. . .. .. . ..
. . . .....
dB

dB
15

15

. .. . .. . . .. . . . ..
. . . .. . .. ..
.. . . . .. . . . . .. .
. .
. . .. . . . . . .. .. .
. . . . . .
10

10

. . .. . . . . . . . .
. . . .
. . . .
. . . .
.. . .
. . .
5

. .
.
.
. .
0

0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) frequency (cycles/day)

W850 W850
30

30

. .
. .
25

25

.
. .. . . .
.. . . .. . . . . .
.. . .. . . . . . . . .
. . . .. .. . . . . .
20

20

. . .
. .. . .. . .. . . .. . . .
. . . .. .
. . . .. .. . . . .. .
. .
. . . .
.. ... . .. .. . .
. .. . .
. .. . . .
.. . .. . .
dB

dB
15

15

. . . .. . . .. . . ..
. . .. . .
. .. . . .. . .
. . . .. .
.. .
.
.. . . . . . .. . . . . .
.. . . . . . . . . .. .. . . . . . . . . .
. .. . . . .. . . . .. . .
10

10

. . .. . . .
.
. .. . . .. . . .
. . . . . .
. .. .. . . . . . . .. .
. . . . . . . . .. .
. . . . .
5

.. . .
. . . .
. . .
. .. . .
0

0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) frequency (cycles/day)

Figure 6.11: Univariate spectral analysis of Canton Island data. The left col-
umn shows the lag window spectral estimates using the Parzen smoothing window
(m = 228) of the three atmospheric series with the dots representing the periodogram
estimates. The right column shows the corresponding multitaper spectral estimates
using K = 5 dpss data tapers (NW = 4).
155

adjusted this way and show no obvious dierence for frequencies close to the annual
frequency. The broad-band feature apparent in the lag window spectral estimates is
seen in the multitaper spectral estimates as multiple peaks in the frequency band of
interest.

6.3.3 Bivariate Spectral Analysis

While the periodogram and direct spectral estimates are useful in the univariate case
for data analysis, they are not appropriate when moving into the realm of multivari-
ate spectral analysis of time series. This is because important statistical quantities,
such as the mean squared coherence (msc), are unity over all frequencies when cal-
culated through these methods see Priestley (1981, p. 708) for an explanation of
this result. Hence, we concentrate our eorts on contrasting lag window bivariate
spectral estimators (as used in the original study) with multitaper bivariate spectral
estimators.
We rst look at the co-spectra between station pressure and 850 mb wind speed
and 150 mb wind speed and 850 mb wind speed. Figure 6.12 show estimates of the
co-spectra using a lag window spectral estimator and a multitaper spectral estimator.
The lag window co-spectra are similar in shape to those reported in the original paper,
diering only in the magnitude. The multitaper co-spectra exhibit the multiple peaks
in the frequency range of broad peaks for the lag window estimates with large peaks
around f = 0:025 being the most dominant feature in that frequency band.
The left column of Figure 6.13 shows the estimated lag window msc for pairwise
comparisons between the three atmospheric time series. We can test, at the  level
of signi cance, the null hypothesis of zero msc by checking the estimated msc, on a
frequency by frequency basis, against 1 ; 2=(
;2) and rejecting if the estimated msc
exceeds it (Koopmans 1974, p. 284). The parameter is the number of equivalent
degrees of freedom associated with the spectral estimates, which is identical to the
156

0.0 0.02 0.04 0.06 0.08

Lag Window Multitaper

0
estimated co-spectrum

-100

-200
SLP / 850 mb
150 mb / 850 mb

-300

0.0 0.02 0.04 0.06 0.08

frequency

Figure 6.12: Estimated co-spectra for the Canton Island data. The left panel displays
the lag window co-spectral estimates using a Parzen smoothing window (m = 228)
and the right panel is the multitaper co-spectral estimates using K = 5 dpss data
tapers (NW = 4).

univariate case. Using Table 269 in Percival and Walden (1993),


:71N = 3:71(3591) 52:
= 3mC
h 228(1:12)
We reject the null hypothesis of non-zero msc in the rst two plots around what is
most likely an annual frequency or close to it. A broad-band peak in the msc is
observed from frequencies 0:0134  f  0:0297 between station pressure and 850 mb
wind speed, corresponding to a 33{75 day oscillation, and from frequencies 0:0150 
f  0:0304 between 150 mb wind speed and 850 mb wind speed, corresponding to a
33{66 day oscillation.
The right column of Figure 6.13 shows the estimated multitaper msc for pairwise
157

SLP and W150 SLP and W150


0.8

0.8
0.6

0.6
estimated msc

estimated msc
0.4

0.4
0.2

0.2
0.0

0.0
0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) ffuency (cycles/day)

SLP and W850 SLP and W850


0.8

0.8
0.6

0.6
estimated msc

estimated msc
0.4

0.4
0.2

0.2
0.0

0.0

0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) ffuency (cycles/day)

W150 and W850 W150 and W850


0.8

0.8
0.6

0.6
estimated msc

estimated msc
0.4

0.4
0.2

0.2
0.0

0.0

0.0 0.02 0.04 0.06 0.08 0.0 0.02 0.04 0.06 0.08
frequency (cycles/day) ffuency (cycles/day)

Figure 6.13: Mean squared coherence of the Canton Island data. The two horizontal
lines in each plot are the  = 0:05 (dotted) and  = 0:01 (dashed) levels of signi cance
test for non-zero msc. The left column contains the lag window spectral estimates
and the right column are the corresponding multitaper spectral estimates.
158

comparisons between the three atmospheric time series. The rst ve dpss data
tapers (NW = 4) were used to compute these estimates. We can test for non-zero
msc as before. Using K = 5 data tapers gives us = 2K = 10 degrees of freedom.
We reject the null hypothesis of non-zero msc in all three plots around f 0:0036,
which corresponds to a period of around 276 days. A second peak is found around
f 0:0058, which corresponds to a period of around 171 days, between station
pressure and 850 mb wind speed. The group of frequencies which peak near the
frequencies of the Madden{Julian oscillation cover a period of around 37{40 days,
slightly shorter and much more narrow a range than the 41{53 days observed in
Madden and Julian (1971).
Several analyses of the estimated multitaper msc was performed between station
pressure and 150 mb wind speed in order to determine if signi cant bias is introduced
by using too many data tapers. Hypothesis testing using 2 data tapers is dicult
given only = 4 degrees of freedom. For 3 data tapers a small peak occurs around
frequency f = 0:0214 (approximately a 47 day oscillation), but the msc appears to
be contaminated with several spikes from the still high variability of the multitaper
estimate. With 4 or more data tapers, nothing except the annual frequency appears to
be signi cant. Hence, while some leakage may be present in this series, the hypothesis
of non-zero mean squared coherence cannot be tested without a sucient number of
data tapers.

6.3.4 Wavelet Analysis


The spectral analysis performed by Madden and Julian (1971) used \modern" tech-
niques for the late 1960s. The multitaper spectral analysis in Section 6.3.3 utilized
time series analysis techniques from the middle 1980s. Here we jump to \modern"
methods by using discrete wavelet transforms. The goal of this analysis is to compare
and contrast the results obtained using wavelet methodology with those of standard
spectral analysis.
159

D1

D2

D3

D4

D5

D6

D7

D8

S8

1958 1960 1962 1964 1966


time (days)

Figure 6.14a: Multiresolution analysis of station pressure series collected at Canton


Island (2:8 S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details
De1  De2 De3 : : :  De8 are associated with variations on scales of 1 2 4 : : :  256 days and
the wavelet smooth Se8 is associated with variations of 512 days or longer.
160

D1

D2

D3

D4

D5

D6

D7

D8

S8

1958 1960 1962 1964 1966


time (days)

Figure 6.14b: Multiresolution analysis of 150 mb wind speed series collected at Canton
Island (2:8 S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details
De1 De2 De3 : : :  De8 are associated with variations on scales of 1 2 4 : : :  256 days and
the wavelet smooth Se8 is associated with variations of 512 days or longer.
161

D1

D2

D3

D4

D5

D6

D7

D8

S8

1958 1960 1962 1964 1966


time (days)

Figure 6.14c: Multiresolution analysis of 850 mb wind speed series collected at Canton
Island (2:8 S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details
De1  De2 De3 : : :  De8 are associated with variations on scales of 1 2 4 : : :  256 days and
the wavelet smooth Se8 is associated with variations of 512 days or longer.
162

A partial MODWT, of order J = 8, was performed on the three time series of


interest using the D(4) wavelet lter and displayed in Figures 6.14a{c. We can see
how the approximate bandpass nature of the MODWT is able to separate events
on dierent scales. For example, two \spikes" in the station pressure series, in 1959
and 1961, only show up in the rst scale wavelet detail De1 of the multiresolution
analysis. We also observe a slight annual oscillation in scale 8, which captures the
frequencies 1=512  f  1=256, and a gradual peak spanning 1962{1964 in the
wavelet smooth Se8. Remember, the wavelet detail at the fth scale De5 captures
frequencies 1=64  f  1=32 and is associated with changes of 16 days i.e., the
dierence of weighted averages each comprised of 16 values. This is therefore the
focus of our attention for investigating the Madden{Julian oscillation.
The wavelet variance for each series is shown in Figure 6.15 plotted on a log-log
scale. Remember, longer scales correspond to lower frequency bands. The wind speed
series show approximately an order of magnitude increase in variability at the lower
scales, meaning there is more high frequency noise in their signals. In each series
there is an abrupt drop in energy from changes of 16 days (5) to changes of 32 days
(6), although this is less apparent in the 150 mb wind series. As previously stated,
this fth scale captures the exact frequency range we would expect to be aected by
the MJO.
The wavelet correlation for the three pairwise combinations of the time series is
shown in Figure 6.16. The con dence intervals for the wavelet correlation between
station pressure and 150 mb wind speed includes zero for scales between 8 and 64 days.
This includes the frequency range of interest (5) and agrees with the lack of signif-
icant frequencies when analyzing the magnitude squared coherence between the two
series. Both the wavelet correlation between station pressure and 850 mb wind speed,
and between 150 mb and 850 mb wind speed are signi cantly dierent from zero at
scales of 4 to 32 days { with a (positive/negative) peak in scale 5 in both plots.
The wavelet cross-correlation, between sea level pressure and 150 mb wind speed,
163

-
5.00
- - -
- -
Wavelet Variance

- -
0.50

- - -
- -
-
- -
sea level pressure
0.05

wind speed 150

1 2 4 8 16 32 64 128

Scale (days)
5.00

- - -
Wavelet Variance

- -
- - -
0.50

- - -
- -
-
- -
sea level pressure
0.05

wind speed 850

1 2 4 8 16 32 64 128

Scale (days)

Figure 6.15: MODWT estimated wavelet variance for Canton Island time series using
a D(4) wavelet lter. The station pressure series is plotted in both the upper and
lower plots for reference. The shaded regions form an approximate 95% con dence
interval.

showed no strong patterns with a range of ;0:19 to 0:20 for scale 5. When comparing
sea level pressure and 850 mb wind speed, they are most positively correlated at a
lag of +2 days (~2850=SLP (5) = 0:412), and when comparing 150 mb and 850 mb
wind speed, they are most negatively correlated at a lag of +1 days (~1150=850(5) =
;0:309). These results, which correspond to the 850 mb wind speed trailing the
164

1.0
Wavelet Correlation

SLP / 150 mb
-0.5 0.0 0.5

- - -
- -
-
-

-
-1.0

1 2 4 8 16 32 64 128
Scale (days)
1.0
Wavelet Correlation

SLP / 850 mb
0.5

-
- -
-
-0.5 0.0

- -

-
-
-1.0

1 2 4 8 16 32 64 128
Scale (days)
1.0
Wavelet Correlation

850 mb / 150 mb
-0.5 0.0 0.5

- -
- -
- -
-
-
-1.0

1 2 4 8 16 32 64 128
Scale (days)

Figure 6.16: MODWT estimated wavelet correlation for Canton Island time series
using a D(4) wavelet lter. The plots are { from top to bottom { station pressure
versus 150 mb wind, station pressure versus 850 mb wind, and 150 mb wind versus
850 mb wind. The shaded regions form approximate 95% con dence intervals.
165

station pressure by 2 days and the 850 mb wind speed leading the 150 mb wind speed
by 1 day, agrees with ndings in Madden and Julian (1971) where the 850 mb wind
was nearly in phase ( 10 ) with station pressure and the two winds were found to
be almost out of phase (177 ), respectively.

6.3.5 Conclusions
It is dicult to be overly critical of the time series analysis techniques of Madden
and Julian (1971). Given the year of the discovery, they used the most reasonable
techniques available, namely, lag window bivariate spectral estimates. The amount
of tapering applied to the series (10% cosine) appears adequate when compared to
stronger data tapers. Their results led to the discovery of a broad-band feature in
atmospheric readings from the tropical Paci c Ocean.
Utilizing the multitaper techniques of Thomson (1982), we obtain \smoothed
versions" of univariate and bivariate spectral estimators. Given that the spectral
bandwidth of a multitaper spectral estimator is smaller than a corresponding lag
window spectral estimator (in general), we will not over-smooth the spectra and
potentially lose interesting features. In the frequency range of interest, we observe
several peaks instead of a one broad peak in the univariate spectra. This translates
into a very choppy estimated co-spectrum and magnitude squared coherence. When
testing the magnitude squared coherence, we observe a period of 37{40 days instead
of the 41{53 day oscillation reported in Madden and Julian (1971).
We have shown how wavelet analysis techniques have captured, and adequately
summarized, information about the Madden{Julian oscillation. The ability of the
DWT to approximately bandpass lter a time series alleviates some of the pre-
processing performed in spectral analysis of atmospheric time series, such as removal
of annual, semi-annual, and seasonal trends. These will naturally be partitioned by
the DWT. Wavelet techniques also open up the possibility of answering questions
about how the time series vary with time.
166

6.4 Wavelet Analysis of Covariance Between the Southern Oscillation


Index and Madden{Julian Oscillation
6.4.1 Introduction
Temporal variations in the Madden{Julian Oscillation (MJO) and its relationship
with El Ni~no{Southern Oscillation (ENSO) events has been previously investigated
using classical spectral analysis see, e.g., Madden and Julian (1994) and references
therein. Anderson, Stevens, and Julian (1984) ltered two time series, atmospheric
relative angular momentum (4 years) and the 850{200 mb shear of the zonal wind at
Truk Island (25 years), with a lter designed to pass the frequency band corresponding
to periods of 32{64 days. They noted that, with respect to the Truk Island series,
a possible association with increased amplitude of the oscillation during the 1956{
57, 1972{73, and 1976{77 ENSO warm events but noted that the duration of these
increases were much longer than the ENSO events. Madden (1986) performed a
seasonally varying cross-spectral analysis on nearly twenty time series of rawinsonde
data from tropical stations around the world. The MJO appears strongest during
December{February and weakest during June{August, and that it is always stronger
in the western Paci c and Indian oceans than elsewhere. Gray (1988) performed a
correlation analysis between daily station pressure data from Truk (7 N, 152 W),
Balboa (9 N, 80 W), Darwin (12 S, 131 E) and Gan (1 S, 73 E), with seasonal sea
surface temperature anomalies on a 5 grid. The data were partitioned into ENSO
and non-ENSO years in non-ENSO years a strong seasonal shift in frequency was
found at all sites except Truk Island. Kuhnel (1989) investigated the characteristics
of a 40{50 day oscillation in cloudiness for the Australo{Indonesian region. Using
data on a 10 by 5 grid, regions in the eastern Indian Ocean and western Paci c
Ocean were found to have a pronounced 40{50 day peak with no obvious seasonal
variation. Another region in the Indian Ocean (5{15 S, 95{100 E) showed a stronger
oscillation in the March{June period. Regions around 5{15 S over northern Australia
167

and in the Paci c Ocean showed a much stronger 40{50 day oscillation during the
Australian monsoon season from December to March, than the rest of the year. The
40{50 day cloud amount oscillation did not appear to be aected by warm ENSO
events. Madden and Julian (1994) note the broadband nature of the oscillation by
comparing the station pressure spectra for Truk Island (7.4 N, 151.8 W) during two
time spans { 1967 to 1979 and 1980 to 1985. The MJO appears to have a 26-day
period in the early 1980s.
The relationship between ENSO events and the MJO is a topic which could bene t
markedly by using wavelet techniques. To investigate how these two atmospheric
phenomena interact, we will analyze two time series. The rst one being the Southern
Oscillation Index (SOI), which is an indicator of ENSO and usually de ned to be
the dierence between monthly averages of the station pressure series from climate
stations at Darwin, Australia (130.8 E, 12.4 S) and Tahiti, French Polynesia (149 W,
14 S) see Figure 6.9 for the locations of these climate stations. It was rst introduced
by Walker (1928) and came from the observation that pressure in the tropical Paci c
Ocean is inversely related to pressure in the Indian Ocean.
In our case, we deviate from the usual de nition of the SOI by introducing a daily
version of it. I obtained daily pressure readings from Darwin, Australia, starting
in 1 June 1957 and continuing to 31 December 1992 (N = 12 998) and dierenced
them see Figure 6.17. The distance of the stations from the equator is apparent in the
strong annual component in the time series. The measurements in the summer and
winter of 1983 appear to be higher than those in adjacent years. This approximately
corresponds to a large ENSO event in the early 1980s. Any missing values were
lled in using one-step-ahead predictions from an ARIMA(3,1,0) model applied to
the series (Jones 1980).
I also obtained daily station pressure readings from Truk Island (7.4 N, 151.8 W)
as an indicator of the MJO. This series also exists from 1 June 1957 to 31 Decem-
ber 1992 see Figure 6.17. Unlike the SOI, there is no apparent annual trend since
168

Truk
1015
1010
1005
Pressure (mb)

1000

SOI
5
0
-10
-20

1960 1970 1980 1990

Time

Figure 6.17: Station pressure series for Truk Island (7.4 N, 151.8 W) and the South-
ern Oscillation Index. The \staggered" look of the Truk Island series prior to 1971 is
the result of rounding to the nearest millibar.

the station is quite close to the equator. Missing values were dealt with in the same
manner as described for the SOI.

6.4.2 Time-Domain and Spectral Analysis

We now propose to analyze the SOI and Truk Island station pressure series using
standard time-domain (e.g., the cross-correlation sequences) and Fourier (e.g., the
cross-spectrum) techniques. The cross-correlation sequence (ccs) is typically esti-
169

mated by
CbXY
(p)
p)
^(XY h i 1
s^(0pX) s^(0pY) 2

(see, e.g., Brockwell and Davis (1991, p. 29)), utilizing the periodogram-based es-
timates of the acvs for fXtg and fYtg, and ccvs. The estimated cross-correlation
sequence for the SOI and Truk Island series is shown in Figure 6.18. The maximum
occurs at a lag of +1 days. We also observe the characteristic broad-band peak com-
monly found in atmospheric time series from this region, with a approximate range
of 35{55 day lags.

0.25
0.20
0.15
0.10
ccs

0.05
0.0
-0.05

-200 -100 0 100 200

Lag (days)

Figure 6.18: Estimated cross-correlation sequence for the Southern Oscillation Index
and Truk Island station pressure series.

A spectral analysis of these data provides very little insight into the possible rela-
tionship between ENSO events and the MJO. The multitaper co-spectrum between
the SOI and Truk Island station pressure series exhibit large peaks at annual and
inter-annual frequencies, and only a very slight peak in the frequency range of the
MJO. With the co-spectrum producing values so close to zero, the multitaper msc is
170

very erratic and gives a large number of signi cant peaks over 0  f  0:08. Hence,
classical bivariate spectral estimation of these series does not exhibit any indication
of a possible relationship between these two series.

6.4.3 Wavelet Analysis


Daily measurements allow us to apply the MODWT and analyze the sub-series which
correspond to ltered series with approximate pass-band 1=2j+1  jf j  1=2j . Due to
the approximate bandpass nature of the MODWT, with the approximation improving
as the length of the wavelet lter increases, it is unnecessary to remove any annual
or semiannual components (a similar argument is made when bandpass ltering at-
mospheric time series in Anderson et al. (1984)), which should be roughly captured
in the 7 and 8 scales. The MJO is known to occur with periods of around 30{60
days. We therefore expect to see it in scale 5 , associated with changes of 16 days
and an approximate pass-band of 1=64  jf j  1=32.
A partial MODWT (J = 10) was applied to each series using the D(4) wavelet
lter. Figures 6.19 and 6.20 give multiresolution analyses of the Truk Island station
pressure series and SOI, respectively. For the Truk Island series, we observe only
a slight annual trend in D8 and the obvious disruption in the early 1980s appears
to primarily aect scales 7 and 8 . The fth scale appears to fade in and out in
magnitude with no apparent pattern. The SOI multiresolution analysis exhibits a
strong annual trend where the disturbance in the early 1980s aects the scales 8
and 9 . The scale associated with the MJO, 5 , exhibits numerous bursts across
time.
Figure 6.21 gives the estimated wavelet variance for each series (cf. Section 3.4).
We see that the SOI has a higher amount of variability in every scale i.e., the approx-
imate 95% con dence intervals do not overlap except for scale 8 which corresponds
to the annual frequency. Here, the estimated wavelet covariance for the SOI is more
than an order of magnitude higher than those for Truk, but the associated error is also
171

D1

D2

D3

D4

D5

D6

D7

D8

D9

D10

S10

1960 1970 1980 1990


time (days)

Figure 6.19: Multiresolution analysis of station pressure series collected at Truk Island
(7.4 N, 151.8 W), from June 1957 through December 1992, using the D(4) wavelet
lter and the MODWT. The wavelet details De1 De2 De3 : : :  De10 are associated with
variations on scales of 1 2 4 : : :  1024 days and the wavelet smooth Se10 is associated
with variations of 2048 days or longer.
172

D1

D2

D3

D4

D5

D6

D7

D8

D9

D10

S10

1960 1970 1980 1990


time (days)

Figure 6.20: Multiresolution analysis for the daily Southern Oscillation Index, from
June 1957 through December 1992, using the D(4) wavelet lter and the MODWT.
The wavelet details De1 De2 De3 : : :  De10 are associated with variations on scales of
1 2 4 : : :  1024 days and the wavelet smooth Se10 is associated with variations of
2048 days or longer.
173

-
-
1.00

-
-
- - -
-
Wavelet Variance
0.50

- -
- -
-

-
-
- -
0.10

- -
0.05

Truk Station Pressure

Southern Oscillation Index

1 2 4 8 16 32 64 128 256 512


Scale (days)

Figure 6.21: MODWT estimated wavelet variance for the Southern Oscillation Index
and Truk Island station pressure series.

quite large. The Truk Island estimated wavelet variance appears to follow the SOI
estimates in shape, with much less emphasis at the semi-annual and annual scales.
Figure 6.22 shows the estimated wavelet correlation between the SOI and Truk
station pressure series at a lag of zero days. The wavelet correlation appears to be
signi cantly dierent from zero for all scales except 6 and 7, giving moderately pos-
174

1.0

-
0.5

- -
Wavelet Correlation

-
- -
- -
- -
0.0
-0.5
-1.0

1 2 4 8 16 32 64 128 256 512


Scale (days)

Figure 6.22: MODWT estimated wavelet correlation for the Southern Oscillation
Index and Truk Island station pressure series. The transformed con dence intervals
were computed using Section 5.4.2.

itive correlations. These results are liberal for scales 7 and 8 because the con dence
intervals assume approximate zero correlation between the product of DWT wavelet
coecients. This is not true for scales 7 and 8, since they involve the semi-annual
and annual oscillations, and residual autocorrelations in the time-varying wavelet co-
175

variance persist. The signi cant correlation at 5 lends credibility to the hypothesis
of an association between the SOI and MJO.

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

-200 -100 0 100 200


Lag (days)

Figure 6.23: MODWT estimated wavelet cross-correlation for the Southern Oscilla-
tion Index and Truk Island station pressure series for lags up to 240 days. The
con dence intervals from Figure 6.22 apply on a point-to-point basis. The positive
peak in the fth scale 5 is at a lag of 0 days.

If we are to investigate a possible lead/lag relationship between the two series, then
176

the wavelet cross-correlation must be estimated for various lags. Figure 6.23 shows the
estimated wavelet cross-correlation between the SOI and Truk Island station pressure
series. The large positive peak in the rst ve scales is at a lag of 1 day for scales
1{2 , a lag of 2 days for scales 3, a lag of 4 days for scale 4 and zero days for scale
5. In the fth scale, the largest negative value is at a lag of 20 days. The higher
scales do not show any apparent trend when looking at lags up to 240 days. Possible
interpretations for the rst four scales is that of weather patterns as they travel from
West to East (cf. Figure 6.9). The abrupt change in lead/lag relationship of the
wavelet cross-correlation at the fth scale is most likely due to the MJO. Patterns
in higher scales (lower frequencies) correspond to semi-annual, annual, and inter-
annual trends. The ability of the wavelet cross-covariance to analyze (decompose)
the usual covariance between these two series on a scale by scale basis allows these
interpretations to be made.
Although direct comparison between Figure 6.23 and Figure 6.18 is not appropri-
ate, because the wavelet correlation does not decompose the correlation between two
stationary processes, the wavelet covariance does decompose the covariance between
two time series. Since the wavelet correlation is simply the wavelet covariance stan-
dardized at each scale, the shape of each wavelet cross-correlation is the same even
though the magnitudes are o. Hence, we may make a rough comparison between
the two, keeping in mind the facts just stated. The rst obvious dierence is the
p) is positive for all negative lags. Looking at Figure 6.23, we see that
fact that ^(XY
the larger scales (9 and 10) are all positive and contribute to this feature, whereas
for positive lags they are close to zero and allow the annual scale (8) to dominate.
The two dips on either side of the peak at lag +1 is the superposition of the rst six
scales in Figure 6.23. The subsequent peak around a lag of +40 days is a result of
the negative correlations for scales 5 and 6 pushing down the annual correlation
(8). It is not, most likely, an interesting feature in the association between these
two processes. The interaction of the correlation structure on a scale by scale basis
177

results in a quite complex looking cross-correlation sequence. However, when broken


up with the wavelet transform a few simple, yet distinct, patterns appear which may
be more easily interpreted.

6.4.4 Investigating Seasonal Variation in the Madden{Julian Oscillation


So far the analysis performed has largely ignored an important feature of discrete
wavelet transforms, their ability to extract information which is local in time. With
respect to these data, the question of whether or not the MJO changes over time is of
particular interest. Changing over time could have (at least) two potential meanings:
that the strength of the MJO is changing over time or that the frequency of the MJO
is changing over time. The rst type of change would produce a pattern of increasing
and decreasing coecients for the time-varying wavelet variance i.e., the squared
wavelet coecients at the same scale (recall that the MJO should be captured in the
scale 5 coecients). The second type of change would not only produce diering
magnitudes of the time-varying wavelet variance, but, if the change in frequency was
large enough, a shift of large coecients from one scale to another. Madden and
Julian (1994) investigated the latter type of change, with respect to the MJO at Truk
Island, using univariate spectral methods.
Let us investigate the possibility of a changing MJO over time by plotting the
time-varying wavelet variance for the Truk Island station pressure series and SOI,
individually. In Figure 6.24, we see the scale 5 time-varying wavelet variance for
November through April, what I roughly call \winter," and for May through October,
what I roughly call \summer." We notice that the variability is much greater in the
winter months when compared with the summer months { especially in the SOI. The
median value of winter wavelet variance is 0.22 versus just 0.15 for the summer with
respect to the Truk Island station pressure series. For the SOI, the median value of
winter wavelet variance is 0.48 compared with 0.23 in the summer. The most extreme
years, for winter wavelet variance, appear to be 1961, 1986 and 1992 for the SOI and
178

Truk Station Pressure


"Winter"
20

15

10

0
Truk Station Pressure
"Summer"
20

15
Time-Varying Wavelet Variance

10

SOI
"Winter"
20

15

10

0
SOI
"Summer"
20

15

10

1960 1970 1980 1990

Time

Figure 6.24: Time-varying wavelet variance for the Truk Island station pressure series
and SOI at the fth scale (5), using the MODWT and D(4) wavelet lter. The
\winter" period corresponds to November through April and the \summer" period
corresponds with May through October.
179

for the Truk Island station pressure series, 1959, 1974, 1978 and 1990. These extreme
years for the Truk Island series, before 1990, agree with the results from Anderson,
Stevens, and Julian (1984).
It is evident, from the analysis presented here, that a seasonal pattern exists in the
Madden{Julian oscillation { even in locations close to the equator. This is a feature
not easily recognized using classical spectral techniques. This increased knowledge
of MJO variability changing with time is exciting, and will hopefully allow research
scientists to better describe similar physical phenomena.

6.4.5 Investigating ENSO Variation of the Madden{Julian Oscillation


We have already seen how the association between the Southern Oscillation Index and
station pressure series collected at Truk Island, for scales associated with the MJO,
exhibits dierence characteristics between \summer" and \winter" seasons. Now I
propose to qualitatively analyze the same association between diering periods of
ENSO activity. The SOI is a measure of the strength of the trade winds, where high
SOI (pressure dierence from East to West) is associated with La Ni~na conditions and
low SOI (pressure dierence from West to East) is associated with El Ni~no conditions.
To construct an interpretable indicator of ENSO activity, the last two wavelet
details and wavelet smooth from the multiresolution analysis of the daily SOI (cf.
Figure 6.20) were combined. The sample mean was removed and the series was then
inverted in order to agree with the conventional SOI see Figure 6.25. Looking at
positive and negative values of this time series, there is good agreement between it
and the conventional SOI. We see the large El Ni~no events of 1981{1982 and 1986{
1987, with a subsequent La Ni~na event in 1988{1989.
The time-varying wavelet variance for scale 5 of the station pressure series col-
lected at Truk Island, de ned to be the squared scale 5 wavelet coecients, is given
in the top half of Figure 6.26. The upper plot is the time-varying wavelet variance
during La Ni~na periods, when our measure of ENSO activity is positive, and the
180

-2

-4

1960 1970 1980 1990

Figure 6.25: Indicator of ENSO activity, constructed by combining the last two
wavelet details and wavelet smooth from the multiresolution analysis of the South-
ern Oscillation Index (cf. Figure 6.20) i.e., De9 + De10 + Se10. The sample mean was
removed and the series was inverted in order to agree with the conventional SOI.

lower plot is the time-varying wavelet variance during El Ni~no periods, when ENSO
activity is negative. There is no apparent dierence between the two time series, the
median value for La Ni~na periods is 0.19 while the median value for El Ni~no periods
is 0.16.
The time-varying wavelet covariance for scale 5, de ned to be the product of scale
5 wavelet coecients computed from the SOI and station pressure series collected
at Truk Island, is given in the bottom half of Figure 6.26. Again, the upper plot is
the time-varying wavelet covariance during La Ni~na periods and the lower plot is the
time-varying wavelet variance during El Ni~no periods. Here, the wavelet covariance
during El Ni~no periods has much higher extreme values in the early 1990s, but it is
still dicult to distinguish between La Ni~na and El Ni~no periods. The median value
for La Ni~na periods is 0.036 while the median value for El Ni~no periods is 0.006.
Figure 6.27 provides similar information to that of Figure 6.26 for scale 4 , which
181

Variance for ENSO > 0.5


8

-2

Variance for ENSO < -0.5


8

4
Time-Varying Wavleet Quantities

-2

Covariance for ENSO > 0.5


8

-2

Covariance for ENSO < -0.5


8

-2

1960 1970 1980 1990

Time

Figure 6.26: Time-varying wavelet quantities, for the scale associated with the
Madden{Julian oscillation (5), partitioned into El Ni~no and La Ni~na periods. The
upper two plots display the wavelet variance and the lower two display the wavelet
covariance.
182

Variance for ENSO > 0.5


10

-2

-4
Variance for ENSO < -0.5
10

6
Time-Varying Wavleet Quantities

-2

-4

Covariance for ENSO > 0.5


10

-2

-4
Covariance for ENSO < -0.5
10

-2

-4

1960 1970 1980 1990

Time

Figure 6.27: Time-varying wavelet quantities, for the scale associated with shorter
periods than the Madden{Julian oscillation (4), partitioned into El Ni~no and La
Ni~na periods. The upper two plots display the wavelet variance and the lower two
display the wavelet covariance.
183

is associated with shorter periods than the MJO. The time-varying wavelet variance
for the Truk Island station pressure series (top two plots) appears to have a greater
number of large coecients in the La Ni~na periods. This could indicate a potential
frequency shift in the MJO similar to the one discussed in Gray (1988), who looked at
station pressure and sea surface temperature anomalies. The time-varying wavelet co-
variance between the SOI and station pressure series collected at Truk Island (bottom
two plots) are quite similar, as was the case with scale 5.
Chapter 7
CONCLUSIONS AND FUTURE DIRECTIONS
This chapter contains ideas for future work that would complement the material
I have presented in the two major areas of my dissertation. Final comments are also
provided.

7.1 Distributional Results for Testing Homogeneity of Variance


Monte Carlo experiments were mainly used to obtain the quantiles for the test statis-
tic D when testing for homogeneity of variance, and the multiple variance change
procedure utilized its asymptotic distribution which is proportional to a Brownian
bridge. Starting with the Haar wavelet, it may be possible to obtain analytical ex-
pressions for the quantiles of D { with higher order wavelets following. This would
allow us to abandon Monte Carlo studies and not rely on \sucient" sample sizes in
order to appeal to the asymptotic distribution of D. These results would be most
useful when testing multiple variance changes where the asymptotic distribution is
utilized exclusively. In addition, this may also provide insight into the distribution of
D when the MODWT wavelet coecients are used instead of the DWT coecients.

7.2 The Schwarz Information Criterion


Chen and Gupta (1997) proposed an information criterion based approach to testing
and locating variance change points. Let X1  : : :  XN be a sequence of Gaussian
random variables with parameters (1 12) (2 22) : : :  (N  N2 ): Assume that 1 =
2 = = N = 0, which is true when testing wavelet coecients. The Schwarz
185

Information Criterion (SIC), also known as the Bayesian Information Criterion (BIC),
is de ned to be ;2 log L(^)+ p log N , where L(^) is the maximum likelihood function
for the model, p is the number of free parameters in the model, and N is the sample
size. Two models are being compared, the null hypothesis H0 of Equation (4.6) and
an alternative hypothesis H1 with two distinct variances i.e.,
H1 : 2
1 = = k 6= k+1
2 2 = = 2
N:

We reject H0 if SIC(N ) > SIC(k) for some k and estimate its position of the change
point k0 by k^ such that
SIC(k^) = 1min
kN
SIC(k)
where SIC(N ) is the SIC under H0 and SIC(k) is the SIC under H1 for k = 1 : : :  N ;
1. Here
SIC(N )  N log 2 + N log ^ 2 + N + log N
and
SIC(k)  N log 2 + k log ^2 k + (N ; k) log ^>k
2 + N + 2 log N

where
1 XN
1 Xk
1 XN
^2  N Xi  ^k  k Xi  and ^>k  N ; k
2 2 2 2 Xi2:
i=1 i=1 i=k+1
Since we require at least one sample in each estimate, we can only detect changes
for 2  k0  N ; 1. To eliminate, or at least suppress, the possibility of random
uctuations in the data contributing to the dierence between the SIC's, Chen and
Gupta (1997) introduced a signi cance level  and its associated critical value c .
Hence, H0 is rejected if SIC(N ) > SIC(k)+ c for some 2  k0  N ; 2. Approximate
values of c can be obtained using the formula
 1  ;  b(log N )

c = ; a(log N ) log log 1 ;  + exp ;2e b(log N ) ; 12
+ a(log N ) ; log N
186

and found in Table 1 of Chen and Gupta (1997).


It would be of interest to compare the performance of this method to the cumula-
tive sum of squares testing procedure, proposed in Section 4.2.1, for detecting single
and multiple variance changes. A modi cation, similar to that in Section 4.5 using
the MODWT, would also allow this information criterion approach to estimate the
location of the variance change.

7.3 Re
nement of the Multiple Variance Change Testing Procedure
At the present time, the procedure for detecting multiple variance changes in time
series is rather crude. The DWT is used solely for testing and the MODWT for locat-
ing the variance change points. The information from one set of wavelet coecients
is not used to inuence the other procedure. In this sense, I am making a \leap of
faith" in that the DWT and MODWT wavelet coecients will identically represent
features in the original time series. This is most likely not true, especially at higher
scales. If several peaks occur in the rotated cumulative variance, then it may well be
the case that the maximum value at scale j from the DWT and MODWT will not
correspond with the same location.
To ensure correspondence between the DWT and MODWT maxima, I could in-
clude a logical statement which determines if the two locations are roughly equivalent.
If so, the location is kept as a signi cant variance change, otherwise the next highest
value from the MODWT could be selected and compared with the DWT maxima.
This is repeated until an agreement is reached between the two transforms.
A more appealing x to this problem is to detect and locate variance change
points using the MODWT. This is not currently possible, since we lack an asymptotic
distribution for the test statistic when using MODWT wavelet coecients. Monte
Carlo studies are possible when testing for a single variance change (and therefore
a xed sample size), but the multiple testing procedure guarantees not knowing the
187

sample size after the rst split { making Monte Carlo results dicult to utilize.
The use of an equivalent degrees of freedom argument has been shown to reasonably
approximate the distribution of the wavelet variance and also proven sucient, for
certain sample sizes, to modify the test statistic D computed with MODWT wavelet
coecients.

7.4 Testing Homogeneity of Covariance


With the analysis of bivariate time series now possible using wavelet techniques, the
question of homogeneity of covariance arises. A cumulative sum of squares test statis-
tic may be useful for investigating departures from a constant association between
two time series. More work is needed to formulate speci c statistical hypotheses.
Following Section 4.2.1, let
Pk jXj Yj j
Qk  PNj=1  k = 1 : : :  N ; 1
j =1 jXj Yj j
be the partial cumulative absolute covariance between two processes fXt g and fYtg
and de ne DXY  max(DXY +  D; ), where
XY
k
 k ; 1

DXY  1max
+
kN ;1 N ; 1
; Qk and DXY  1max;
kN ;1
Qk ; N ; 1 :
As before, percentage points for the distribution of DXY under the null hypothesis
can be readily obtained through Monte Carlo simulations. In fact, the empirical
distribution of DXY appears to be similar to D see Figure 7.1. Across all scales, we
see that the distribution of D is stochastically greater than that of DXY { especially
at the right tail { and the dierence increases as we go further in that right tail.
Plots of the empirical densities of each test statistic indicate that they are skewed
to the right and very similar in shape. Initially, Monte Carlo experiments should
be reasonable for approximating the distribution of DXY . For the time being, single
changes of covariance are easily tractable but a reasonable approximation must be
found in order to test multiple changes of covariance on a scale by scale basis.
188

Level: 5 Level: 6

0.7
0.4

0.6
0.5
0.3

0.4
0.2

0.3
0.2
0.1

0.1
0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8

Level: 3 Level: 4
0.20
Covariance Critical Values

0.25
0.15

0.15
0.10
0.05

0.05

0.05 0.10 0.15 0.20 0.25 0.1 0.2 0.3 0.4

Level: 1 Level: 2
0.12

0.16
0.10

0.12
0.08
0.06

0.08
0.04

0.04
0.02

0.02 0.04 0.06 0.08 0.10 0.12 0.05 0.10 0.15

Variance Critical Values

Figure 7.1: Quantile-quantile plot comparing the Monte Carlo distributions of D


(horizontal axis) and DXY (vertical axis). The sample size of the original time series
was N = 1024, which was decomposed by the LA(8) wavelet lter using the DWT.
Each plot contains 5000 observations.
189

Previous work in the distribution of the product of two normally distributed vari-
ables may be applicable here see, e.g., Craig (1936) and Aroian (1947). Torrence
and Compo (1998) utilize the distribution of the square root of the product of two 2
random variables when computing con dence intervals for their cross-wavelet power.

7.5 Equivalent Degrees of Freedom for the Wavelet Covariance


An equivalent degrees of freedom argument is used by Percival (1995) to specify
alternative con dence intervals for the wavelet variance. The idea of alternative
con dence intervals for the wavelet covariance may also be of interest. Priestley (1981,
pp. 695-696) mentions using the complex Wishart distribution for approximating the
distribution of the cross-spectral matrix. It is not immediately apparent if this is
useful when considering the wavelet covariance. Let Xt and Yt t = 0 : : :  N ; 1
be de ned as in Section 5.2.1. Recall that the MODWT estimator of the wavelet
covariance, associated with scale j , is de ned to be
NX;1 f(X ) f(Y )
~XY (j ) = e1 W W
Nj l=L ;1 jl jl
j

(Equation (5.3)). Now de ne ;XY (j ) to be the 2 2 sample wavelet variance/covariance


matrix computed using the MODWT i.e.,
2 3 2 2 3
~ ( ) ~ ( ) ~ ( ) ~ ( )
;XY (j )  4 XX j XY j 5 = 4 X j XY j 5 :
~Y X (j ) ~Y Y (j ) ~Y X (j ) ~Y2 (j )
Hence, the joint distribution of the elements of Nej ;XY (j ) is Wishart (Johnson and
Kotz 1972, Ch. 38). We are not interested in the distribution of ;XY (j ) per se, but
in the marginal distribution of ~XY (j ) only. The diagonal elements of ;XY (j ) are
a quadratic form of normal variables and may be approximated by a 2 distribution
(c.f. Section 3.4.2). For the o-diagonal elements of ;XY (j ), their distribution is not
of the gamma type (Johnson and Kotz 1972, p. 159). However, Goodman (1957) de-
190

rived the asymptotic marginal and joint distributions of bivariate spectral estimators.
These results may be adapted for use with our bivariate wavelet estimators.

7.6 Assessing Non-Gaussian/Non-Linear Processes


With respect to bivariate wavelet analysis of time series, I have given results solely for
certain Gaussian (normal) processes. This assumption greatly simpli es proofs which
require properties such as strict stationarity and/or nite moments. However, time
series analysis often does not have the \luxury" of analyzing only Gaussian processes.
Working under the assumption of a linear process (Hannan 1970, pp. 9{13) with not
necessarily Gaussian errors, or even more general processes would greatly expand the
methodology provided in this dissertation.
Recently, Serroukh, Walden, and Percival (1998) have looked at the statistical
properties of the MODWT estimator of wavelet variance for non-Gaussian and non-
linear time series. A central limit theorem is established for the centralized wavelet
variance under the assumption of strict stationarity and a given number of nite
moments. Using the surface albedo measurements of pack ice from Lindsay et al.
(1996), they show that con dence intervals based on the Gaussian assumption are
much smaller than the ones under more general processes at smaller scales.

7.7 Final Comments


I believe that better statistical methods come from the necessity demanded by real
world problems. In this dissertation I have attempted to provide new and useful
methodology in order to analyze problems in time series analysis where techniques
were previously lacking. The two primary areas being detecting and locating (multi-
ple) changes of variance in time series with long memory structure using the DWT,
and extending wavelet analysis of variance for univariate time series to a wavelet
analysis of covariance for bivariate time series i.e., introducing the wavelet cross-
191

covariance and cross-correlation. With respect to the former area, I have tried to
provide a more thorough investigation of ideas from the study of long memory pro-
cesses, change-point detection and the ability of the DWT to approximately decor-
relate time series on a scale by scale basis. The concepts of wavelet covariance and
correlation are natural extensions of the work by D. B. Percival and others on the
wavelet variance.
The DWT is a powerful mathematical tool, enabling statisticians to examine
much more complicated processes by separating features on a scale by scale basis.
However, it cannot be applied to problems without caution. Which wavelet lter to
use is a very important issue. To help select an appropriate wavelet lter, several
lters of various lengths should be applied to the data and visually analyzed to help
detect the potential leakage of low frequency features throughout the multiresolution
analysis. Most importantly, serious thought should be put into how the shape of
the underlying wavelet lter matches the physical process from where the data was
sampled. Keeping these issues in mind, there should be a great variety of problems
where wavelet analysis is bene cial.
BIBLIOGRAPHY
Abraham, B. and W. W. S. Wei (1984). Inferences about the parameters of a time
series model with changing variance. Metrika 31, 183{194.
Abry, P. and D. Veitch (1998). Wavelet analysis of long-range-dependent trac.
IEEE Transactions on Information Theory 44 (1), 2{15.
Allan, D. W. (1966). Statistics of atomic frequency standards. Proceedings of the
IEEE 31, 221{230.
Anderson, J. R., D. E. Stevens, and P. R. Julian (1984). Temporal variations of the
tropical 40{50 day oscillation. Monthly Weather Review 112 (12), 2431{2438.
Anderson, T. W. (1971). The Statistical Analysis of Time Series. New York: John
Wiley and Sons, Inc.
Aroian, L. A. (1947). The probability function of the product of two normally
distributed variables. The Annals of Mathematical Statistics 18, 265{271.
Balek, J. (1977). Hydrology and Water Resources in Tropical Africa, Volume 8 of
Developments in Water Science. New York: Elsevier Scienti c Pub. Co.
Beran, J. (1994). Statistics for Long-Memory Processes, Volume 61 of Monographs
on Statistics and Applied Probability. New York: Chapman & Hall.
Beran, J. and N. Terrin (1996). Testing for a change of the long-memory parameter.
Biometrika 83 (3), 627{638.
Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley
& Sons.
193

Bingham, C., M. D. Godfrey, and J. W. Tukey (1967). Modern techniques of power


spectrum estimation. IEEE Transactions on Audio and Electroacoustics 15 (2),
56{66.
Bloom eld, P. (1976). Fourier Analysis of Time Series: An Introduction. New
York: John Wiley & Sons.
Box, G. E. P. and G. M. Jenkins (1976). Time Series Analysis: Forecasting and
Control (2 ed.). Time Series Analysis and Digital Processing. San Francisco:
Holden Day.
Bradshaw, G. A. and T. A. Spies (1992). Characterizing canopy gap structure in
forests using wavelet analysis. Journal of Ecology 80 (2), 205{215.
Bretherton, C. S., M. Widmann, V. P. Dymnikov, J. M. Wallace, and I. Blade
(1998). Eective number of degrees of freedom of a spatial eld. Submitted to
Journal of Climate.
Briggs, W. L. and V. E. Henson (1995). The DFT: An Owner's Manual for the
Discrete Fourier Transform. Philadelphia: Society for Industrial and Applied
Mathematics.
Brillinger, D. R. (1979). Con dence intervals for the crosscovariance function. In
Mathematical Statistics, Volume 5 of Selecta Statistica Canadiana, pp. 1{16.
Hamilton, Ontario: McMaster University Printing Services.
Brillinger, D. R. (1981). Time Series: Data Analysis and Theory. Holden-Day
Series in Time Series Analysis. San Francisco: Holden-Day. Expanded edition.
Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods (2 ed.).
New York: Springer-Verlag.
Brooks, C. E. P. (1949). Climate Through the Ages (2 ed.). New York: McGraw-Hill
Book Co.
194

Chen, J. and A. K. Gupta (1997). Testing and locating variance changepoints


with application to stock prices. Journal of the American Statistical Associa-
tion 92 (438), 739{747.
Chui, C. K. (1997). Wavelets: A Mathematical Tool for Signal Analysis. SIAM
Monographs on Mathematical Modeling and Computation. Philadelphia: Soci-
ety for Industrial and Applied Mathematics.
Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical
Statistics 7, 1{15.
Daubechies, I. (1992). Ten Lectures on Wavelets, Volume 61 of CBMS-NSF Re-
gional Conference Series in Applied Mathematics. Philadelphia: Society for
Industrial and Applied Mathematics.
David, F. N. (1966). Tables of the correlation coecient. In E. S. Pearson and
H. O. Hartley (Eds.), Biometrika Tables for Statisticians (3 ed.), Volume 1.
Cambridge: Cambridge University Press.
Davies, R. B. and D. S. Harte (1987). Tests for Hurst eect. Biometrika 74, 95{101.
Davis, W. W. (1979). Robust methods for detection of shifts of the innovation
variance of a time series. Technometrics 21 (3), 313{320.
Donoho, D. L. (1993). Nonlinear wavelet methods for recovery of signals, densities,
and spectra from indirect and noisy data. In Proceedings of Symposia in Applied
Mathematics, Volume 47, pp. 173{205. American Mathematical Society.
Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on In-
formation Theory 41 (3), 613{627.
Donoho, D. L. and I. M. Johnstone (1994). Ideal spatial adaptation by wavelet
shrinkage. Biometrika 81 (3), 425{455.
195

Donoho, D. L., I. M. Johnstone, G. Kerkyacharian, and D. Picard (1995). Wavelet


shrinkage: Asymptopia? (with discussion). Journal of the Royal Statistical
Society B 57 (2), 301{369.
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coe-
cient in samples from an inde nitely large population. Biometrika 10, 507{521.
Fisher, R. A. (1929). Tests of signi cance in harmonic analysis. Proceedings of the
Royal Society of London, Series A 125, 54{59.
Fuller, W. A. (1996). Introduction to Statistical Time Series (2 ed.). New York:
Wiley-Interscience.
Giraitis, L. and R. Leipus (1995). A generalized fractionally dierencing approach
in long-memory modeling. Lietuvos Matematikos Rinkinys 35 (1), 65{81.
Goodman, N. R. (1957). On the joint estiamtion of spectra, cospectrum and
quadrature spectrum of a two-dimensional stationary Gaussian process. Sci.
Paper No. 10. Engrng. Statist. Lab., New York Univ., New York.
Graf, H.-P. (1983). Long-range correlations and estimation of the self-similarity
parameter. Ph. D. thesis, Eidgen-ossische Technische Hochschule, Z-urich.
Granger, C. W. J. and R. Joyeux (1980). An introduction to long-memory time
series models and fractional dierencing. Journal of Time Series Analysis 1,
15{29.
Gray, B. M. (1988). Seasonal frequency variations of the 40{50 day oscillation.
Journal of Climatology 8, 511{519.
Haar, A. (1910). Zur Theorie der orthogonalen Funktionen-Systeme. Mathematis-
che Annalen 69, 331{371. In German.
Hannan, E. J. (1970). Multiple Time Series. New York: John Wiley and Sons, Inc.
196

Hosking, J. R. M. (1981). Fractional dierencing. Biometrika 68 (1), 165{176.


Hosking, J. R. M. (1984). Modeling persistence in hydrological time series using
fractional dierencing. Water Resources Research 20 (12), 1898{1908.
Hsu, D.-A. (1977). Tests for variance shift at an unknown time point. Applied
Statistics 26 (3), 279{284.
Hsu, D.-A. (1979). Detecting shifts of parameter in gamma sequences with ap-
plications to stock price and air trac ow analysis. Journal of the American
Statistical Association 74, 31{40.
Hudgins, L., C. A. Friehe, and M. E. Mayer (1993). Wavelet transforms and atmo-
spheric turbulence. Physical Review Letters 71 (20), 3279{3282.
Hudgins, L. H. (1992). Wavelet Analysis of Atmospheric Turbulence. Ph. D. thesis,
University of California, Irvine.
Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transactions of the
American Society of Civil Engineers 116, 770{779.
Imhof, J. P. (1961). Computing the distribution of a quadratic form in normal
variables. Biometrika 48, 419{426.
Inclan, C. and G. C. Tiao (1994). Use of cumulative sums of squares for retro-
spective detection of changes of variance. Journal of the American Statistical
Association 89 (427), 913{923.
Isserlis, L. (1918). On a formula for the product-moment coecient of any order
of a normal frequency distribution in any number of variables. Biometrika 12,
134{139.
Jarvis, C. S. (1936). Flood-stage records of the river Nile. Transactions of the
American Society of Civil Engineers 101, 1012{1071.
197

Jensen, M. J. (1994). Wavelet analysis of fractionally integrated processes. Techni-


cal Report ewp-em/9405001, Department of Economics, Washington University.
Johnson, N. L. and S. Kotz (1970). Continuous Univariate Distributions. New York:
Houghton Mi.in.
Johnson, N. L. and S. Kotz (1972). Continuous Multivariate Distributions. New
York: John Wiley & Sons, Inc.
Jones, R. H. (1980). Maximum likelihood tting of ARMA models to time series
with missing observations. Technometrics 22 (3), 389{395.
Kawata, K. and S. Arimoto (1996). Signal matching using wavelet correlation.
Electronics and Communications in Japan 3 79 (9), 23{34. Translated from
Denshi Joho Tsushin Gakkai Ronbunshi, Vol. 78-A, No. 12, December 1995,
pp. 1655{1664.
Koopmans, L. H. (1974). The Spectral Analysis of Time Series. New York. Aca-
demic Press.
Kotz, S., N. L. Johnson, and C. B. Read (Eds.) (1982). Encyclopedia of Statistical
Sciences. New York: Wiley.
Kuhnel, I. (1989). Spatial and temporal variations in Australo{Indonesian region
cloudiness. International Journal of Climatology 9 (4), 395{405.
Lawrence, A. J. and N. T. Kottegoda (1977). Stochastic modelling of riverow time
series. Journal of the Royal Statistical Society A 140 (1), 1{47.
Leftus, V. (1986). Solar activity variations and climatic changes. Studia Geophysica
et Geodaetica 30 (1), 93{110.
Lehmann, E. L. (1983). Theory of Point Estimation. New York: Wiley.
Li, H. and T. Nozaki (1997). Application of wavelet cross-correlation analysis to
198

a plane turbulent jet. Japanese Society of Mechanical Engineers International


Journal, Series B 40 (1), 58{66.
Lindsay, R. W., D. B. Percival, and D. A. Rothrock (1996). The discrete wavelet
transform and the scale analysis of the surface properties of sea ice. IEEE
Transactions on Geoscience and Remote Sensing 34 (3), 771{787.
Madden, R. A. (1986). Seasonal variation of the 40{50 day oscillation in the tropics.
Journal of Atmospheric Science 43 (24), 3138{3158.
Madden, R. A. and P. R. Julian (1971). Detection of a 40{50 day oscillation in the
zonal wind in the tropical paci c. Journal of Atmospheric Science 28, 702{708.
Madden, R. A. and P. R. Julian (1994). Observations of the 40{50 day tropical
oscillation: A review. Monthly Weather Review 122 (5), 814{837.
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The
wavelet representation. IEEE Transactions on Pattern Analysis and Machine
Intelligence 11 (7), 674{693.
Mandelbrot, B. B. and J. W. van Ness (1968). Fractional Brownian motions, frac-
tional noises and applications. SIAM Review 10 (4), 422{437.
Mandelbrot, B. B. and J. R. Wallis (1969). Some long-run properties of geophysical
records. Water Resources Research 5 (2), 321{340.
Mann, H. B. and A. Wald (1943). On stochastic limit and order relationships. The
Annals of Mathematical Statistics 14, 217{226.
McCoy, E. J. and A. T. Walden (1996). Wavelet analysis and synthesis of station-
ary long-memory processes. Journal of Computational and Graphical Statis-
tics 5 (1), 26{56.
McCoy, E. J., A. T. Walden, and D. B. Percival (1998). Multitaper spectral esti-
mation of power law processes. IEEE Transactions on Signal Processing 46 (3),
199

655{668.
Mehrabi, A. R., H. Rassamdana, and M. Sahimi (1997). Characterization of
long-range correlations in complex distributions and pro les. Physical Review
E 56 (1), 712{722.
Mohr, D. L. (1981). Modeling Data as a Fractional Gaussian Noise. Ph. D. thesis,
Princeton University.
Nuri, W. A. and L. J. Herbst (1969). Fourier methods in the study of variance
uctuations in time series analysis. Technometrics 11 (1), 103{113.
Ogden, R. T. (1994). Wavelet Thresholding in Nonparametric Regression with
Change-Point Applications. Ph. D. thesis, Texas A&M University.
Ogden, R. T. (1997). Essential Wavelets for Statistical Applications and Data Anal-
ysis. Boston: Birkhauser.
Ogden, R. T. and E. Parzen (1996). Change-point approach to data analytic
wavelet thresholding. Statistics and Computing 6 (2), 93{99.
Percival, D. B. (1983). The Statistics of Long Memory Processes. Ph. D. thesis,
Department of Statistics, University of Washington.
Percival, D. B. (1992). Simulating Gaussian random processes with a speci ed
spectra. Computing Science and Statistics 24, 534{538.
Percival, D. B. (1993). Three curious properties of the sample variance and auto-
covariance for stationary processes with unknown mean. The American Statis-
tician 47 (4), 274{276.
Percival, D. B. (1994). Spectral analysis of univariate and bivariate time series.
In J. L. Stanford and S. B. Vardeman (Eds.), Statistical Methods for Physical
Science, Volume 28 of Methods of Experimental Physics, pp. 313{348. Boston:
Academic Press, Inc.
200

Percival, D. B. (1995). On estimation of the wavelet variance. Biometrika 82 (3),


619{631.
Percival, D. B. and P. Guttorp (1994). Long-memory processes, the Allan variance
and wavelets. In E. Foufoula-Georgiou and P. Kumar (Eds.), Wavelets in Geo-
physics, Volume 4 of Wavelet Analysis and its Applications, pp. 325{344. San
Diego: Academic Press, Inc.
Percival, D. B. and H. O. Mofjeld (1997). Analysis of subtidal coastal sea
level uctuations using wavelets. Journal of the American Statistical Associ-
ation 92 (439), 868{880.

Percival, D. B. and A. T. Walden (1993). Spectral Analysis for Physical Applica-


tions: Multitaper and Conventional Univariate Techniques. Cambridge: Cam-
bridge University Press.
Percival, D. B. and A. T. Walden (1999). Wavelet Methods for Time Series Anal-
ysis. Cambridge: Cambridge University Press. Forthcoming.

Popper, W. (1951). The Cairo Nilometer, Volume 12 of Publications in Semitic


Philology. Berkeley: University of California Press.

Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992). Nu-


merical Recipes in C: The Art of Scientic Computing (2 ed.). Cambridge:
Cambridge University Press.
Priestley, M. B. (1981). Spectral Analysis and Time Series. London: Academic
Press, Inc.
Rice, S. O. (1945). Mathematical analysis of random noise, part III: Statistical
properties of random noise currents. Bell Systems Technical Journal 24, 46{
156.
201

Riedel, K. S. and A. Sidorenko (1995). Minimum bias multiple taper spectral esti-
mation. IEEE Transactions on Signal Processing 43 (1), 188{195.
Schuster, A. (1898). On the investigation of hidden periodicities with application to
a supposed 26-day period of meterological phenomena. Terrestrial Magnetism 3,
13{41.
Serroukh, A., A. T. Walden, and D. B. Percival (1998). Statistical properties of
the wavelet variance estimator for non-Gaussian/non-linear time series. Tech-
nical Report 98{03, Department of Mathematics, Imperial College of Science,
Technology & Medicine.
Slepian, D. (1978). Prolate spheroidal wave functions, Fourier analysis, and unc-
etainty { V: The discrete case. Bell System Technical Journal 57, 1371{1430.
Srivastava, M. S. (1993). Comparison of CUMSUM and EWMA procedures for
detecting a shift in the mean or an increase in the variance. Journal of Applied
Statistical Science 1 (4), 445{468.
Stephens, M. A. (1970). Use of the Kolmogorov{Smirnov, Cramer{von Mises and
related statistics without extensive tables. Journal of the Royal Statistical So-
ciety B 32 (1), 115{122.
Stephens, M. A. (1986). Tests based on EDF statistics. In R. B. D'Agostino and
M. A. Stephens (Eds.), Goodness-of-Fit Techniques, Volume 68 of STATIS-
TICS: Textbooks and Monographs, pp. 97{193. New York: Marcel Dekker.
Tew k, A. H. and M. Kim (1992). Correlation structure of the discrete wavelet
coecients of fractional Brownian motion. IEEE Transactions on Information
Theory 38 (2), 904{909.
Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. IEEE Pro-
ceedings 70 (9), 1055{1096.
202

Titchmarsh, E. C. (1939). The Theory of Functions (2 ed.). Oxford: Oxford Uni-


versity Press.
Torrence, C. and G. P. Compo (1998). A practical guide to wavelet analysis. Bul-
letin of the American Meteorological Society 79 (1), 61{78.
Toussoun, O. (1925). Memoire sur l'histoire du nil. In Memoires a l'Institut
d'Egypte, Volume 18, pp. 366{404.
Tsay, R. S. (1988). Outliers, level shifts, and variance changes in time series. Journal
of Forecasting 7, 1{20.
Tukey, J. W. (1949). The sampling theory of power spectrum estimates. In Sym-
posium on Applications of Autocorrelation Analysis to Physical Problems, pp.
47{67. Oce of Naval Research, Department of the Navy, Washington, U.S.A.
Verner, M. (1972). Periodical water-volume uctuations of the Nile. Archiv Ori-
entaln 40 (2), 105{123.
Vetterli, M. and J. Kova%cevic (1995). Wavelets and Subband Coding. New Jersey:
Prentice Hall PTR.
Walden, A. T. (1994). Interpretation of geophysical borehole data via interpolation
of fractionally dierenced white noise. Applied Statistics 43 (2), 335{345.
Walden, A. T. and R. E. White (1990). Estimating the statistical bandwidth of a
time series. Biometrika 77, 699{707.
Walker, G. T. (1928). World weather. Monthly Weather Review 56, 167{170.
Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika 82 (2),
385{397.
Wang, Y., J. E. Cavanaugh, and C. Song (1997). Self-similarity index estimation via
wavelets for locally self-similar processes. Department of Statistics, University
of Missouri.
203

Wichern, D. W., R. B. Miller, and D.-A. Hsu (1976). Changes of variance in rst-
order autoregressive time series models { with an application. Applied Statis-
tics 25 (3), 248{256.
Wickerhauser, M. V. (1994). Adapted Wavelet Analysis from Theory to Software.
Wellesley, Massachusetts: A K Peters, Ltd.
Wornell, G. W. (1993). Wavelet-based representations for the 1=f family of fractal
processes. Proceedings of the IEEE 81 (10), 1428{1450.
Wornell, G. W. (1996). Signal Processing with Fractals: A Wavelet Based Approach.
New Jersey: Prentice Hall.
Appendix A
FOURIER THEORY AND FILTERING
Wavelet methodology shares the basic goals of its Fourier cousin, to transform
signals into a dierent domain so that interesting features may be brought to the
surface. This is done by using basis functions that dier from the sines and cosines
utilized by the discrete Fourier transform (DFT). Having been developed much later,
the notation and concepts of wavelet methodology borrow a great deal from the well
established elds of ltering and Fourier analysis. We will now outline basic concepts
and notation which will be used over and over in this dissertation.

A.1 The Discrete Fourier Transform


Even though the Fourier transform comes in several varieties, we present a synopsis
of the discrete time/continuous frequency avor as given in Percival and Walden
(1999, Ch. 2) see Percival and Walden (1993, Ch. 3) for a discussion of all types
of the Fourier transform and related issues. This version of the Fourier transform
facilitates the analysis of discrete sequences of observations (time series), for example,
fxt j t = 0 1 2g. Let us further restrict ourselves to square-summable sequences
P jx j2 < 1.
i.e., 1 t=;1 t
Let X ( ) be a complex-valued function which de nes the DFT (analysis) of fxtg
via
X
1
X (f )  xte;i2ft (A.1)
t=;1
where ; 21 < f < 12 are frequencies. The function X ( ) measures the association
between fxtg and the sequences fe;i2ftg. This is simply one choice of sequences
205

with which to analyze a time series of observations, the discrete wavelet transform
(DWT) uses a formula similar to Equation (A.1) that utilizes sequences which dier
fundamentally from the complex exponentials.
The inverse DFT (synthesis) of fxtg is given by
Z1
xt = A(f )ei2ft df t = 0 1 2 : : : :
;1

This can be shown by substituting Equation (A.1) for X ( ) to get


Z1 Z 1! X
1
!
A(f )ei2ft df = x 0 e;i2ft0
t ei2ft df
;1 ;1 t0 =;1
X1 Z1
= xt0 ei2f (t;t0) df:
t0 =;1 ;1

The integral is non-zero only when t = t0, thus the inverse DFT is established. The
relationship between fxtg and X ( ) is summarized by calling them a Fourier transform
pair using the notation
fxtg ! X ( ):

A.2 Properties of the DFT


We state and prove several properties of the discrete Fourier transform which will
prove useful in later sections, not only concerning the spectral analysis of time series
but also the discrete wavelet transform.
Linearity The DFT of a linear combination of sequences is simply the linear
combination of their respective DFTs,
xt + yt ! X (f ) + Y (f ):
This is easily veri ed by replacing xt with xt in Equation (A.1), noting that
the constant does not depend on t and therefore can be taken out of the integral.
Hence, the DFT of fxtg is X ( ). The same argument is applied to yt to
obtain its DFT, and linearity follows.
206

Translation A shift in time in the original sequence causes a multiplication


by a phase factor to its DFT,

xt; ! e;i2f X (f ):

The proof follows from a direct application of the DFT to fxt; g,


X
1 X
1
xt; e;i2ft = xt0 e;i2f (t0+ )
t=;1 t0 =;1
X
1
= e;i2f xt0 e;i2ft0 = e;i2f X (f ):
t0 =;1

The converse, that is, a shift in frequency in the DFT of a sequence is equivalent
to a multiplication of a phase factor to that sequence, follows from a similar
argument applied to the inverse DFT,
Z 1
2
Z 1
2
1 A(f ; )ei2ft df = 1 A(f 0)ei2(f 0+ )t df 0
;2 ;2
1
2
Z
= ei2 t 1 A(f 0)ei2f 0t df 0 = ei2 txt:
;2

Convolution The convolution of two sequences fxtg and fytg is de ned to


be
X
1
x
yt  xuyt;u  t = 0 1 2 : : : :
u=;1

Convolving two sequences in time results in the multiplication of their respective


DFTs,

x
yt ! X (f )Y (f ): (A.2)
207

The proof is seen by applying the de nitions to the left-hand side of Equa-
tion (A.2) and evaluating the resulting expression
X
1 X
1
!X
1
!
x
yte;i2ft = xuyt;u e;i2ft
t=;1 t=;1 u=;1
X
1 X
1
= xu yt;ue;i2ft
u=;1 t=;1
X
1 X
1
= xu y e;i2f ( +u)
!X
u=;1
1
=;1
!! X
1
!
= xue;i2fu y e;i2f = X (f )Y (f ):
u=;1 =;1
A convolution of the DFTs results in the multiplication of the original sequences,
xt yt ! X
Y (f ):
A proof of this statement is similar to that of Equation (A.2).

Parseval's Relation The DFT is an energy preserving transform in that


X
1 Z 12
xtyt = 1 X (f )Y (f ) df (A.3)
t=;1 ;2

where fxtg and fytg are square-summable sequences with fxtg ! X ( ) and
fytg ! Y ( ). If we let xt = yt, then we have Parseval's relation
X
1 Z 1
2
jxtj =
2
1 jX (f )j2 df:
t=;1 ;2

The proof of Equation (A.3) is done by substituting the de nition of the DFTs
X ( ) and Y ( ) in Equation (A.3) and evaluating the resulting integral
Z 1
2
Z 21 ! X
1
!! X
1
!
X (f )Y (f ) df = xte;i2ft i2ft0
yt0 e df
1 1
;2 ;2 t=;1 t0 =;1
X X
1 1 Z 21 X
1
= xtyt0

1
ei2f (t0;t) df = xtyt
t=;1 t0=;1 ;2 t=;1
since the integral of the modulated complex exponential is one if t = t0 and zero
otherwise.
208

A.3 Filtering of Sequences


Here we present some notation and concepts taken from Percival and Walden (1999,
Ch. 2). The concept of convolution, presented in the previous section, is identical
to that of linear time-invariant ltering. If an input sequence fytg is ltered by
fxtg, then the output from the lter is fx
ytg. Implementation of the discrete
wavelet transform is done eciently by ltering, versus a series of matrix operations.
Notions present in the realm of ltering theory appear whenever discussing wavelet
methodology, and therefore, we present some key topics here.
A lter fhtg is also known as an impulse response sequence and its DFT H ( ) is
called the transfer function. The transfer function characterizes the lter in terms
of the frequencies it captures examples include high-pass, low-pass and band-pass
lters. As the names imply, a high-pass lter retains high frequencies and suppresses
low frequencies, a low-pass lter does the opposite, and a band-pass lter preserves
a speci c range (or band) of frequencies while suppressing all other frequencies. Like
any complex-valued function, the transfer function for a given lter can be expressed
in polar notation as

H (f ) = jH (f )jei (f )

where jH (f )j is the gain function and (f ) is the phase function for the lter. We
will see squared gain functions for several wavelet lters in Section 3.1.
Often is the case where not one, but several lters are used to analyze a sequence.
A cascade of lters (reference?) is a series of J lters such that the output from the
rst lter is the input to the second lter and so on. If fhjt j t = 0 1 2 : : : g,
j = 1 : : :  J , are a series of lters with transfer functions Hj ( ), then the output from
the cascade of lters can be expressed as
X
1
yt = hu xt;u t = 0 1 2 : : : 
u=;1
209

where fhtg is the equivalent lter for the cascade whose transfer function is given by
YJ
H (f )  Hj (f ): (A.4)
j =1
The output from the rst lter fh1tg has DFT H1(f )X (f ). After applying the J
lters, the DFT of fytg is Y (f )  H1(f )H2 (f ) HJ (f )X (f ). Using Equation (A.4)
and the convolution property of the Fourier transform, Y (f ) = H (f )X (f ) and, there-
fore, fytg is simply the convolution of fxtg with fhtg.
Appendix B
UNIVARIATE SPECTRAL ANALYSIS
B.1 Introduction
As with the Fourier transform, we also require concepts from the spectral analysis
of time series in order to better describe and understand wavelet methodology. The
topics described here can be found, using similar notation, within Percival and Walden
(1993) and, with much greater detail, within Priestley (1981).
Let us begin with the spectral representation theorem for a discrete parameter
stationary process. There exists an orthogonal process fZ (f )g de ned on the interval
#;1=2 1=2] such that
ms
Z 1
2
Xt = ei2ft dZ (f ) (B.1)
; 21

for all integers t, where the equality is in the mean square sense. That is, the squared
norm between the left-hand side and right-hand side is zero. We de ne E fjdZ (f )j2g 
dS (I )(f ) for all jf j  1=2, and call S (I )( ) the integrated spectrum of fXtg. For our
purposes here, we will assume the integrated spectrum is dierentiable everywhere
with derivative S ( ), so that

E fjdZ (f )j2g = dS (I )(f ) = S (f )df:

The autocovariance sequence (acvs) of a stationary process fXtg, with zero mean,
can be written as
Z 1

s  E fXtXt+ g =
2
S (f )ei2f df:
; 12
211

Conversely, if fs g is square-summable, the spectrum of fXtg can be de ned in terms


of the acvs via
X
1
S (f ) = s e;i2f  (B.2)
 =;1
and therefore the two quantities form a Fourier transform pair fs g ! S ( ). The
spectrum S ( ) possesses all the properties outlined in Section A.2 and will be useful
in proving various results for functions of wavelet coecients.

B.2 Spectral Estimation


Suppose the time series Xt t = 1 : : :  N , is a realization of a portion of a zero mean
stationary process with sdf S ( ) and autocovariance sequence fs g. Let fs^(p)g !
Sb(p)( ), where s^(p) is the usual biased estimator of the autocovariance sequence i.e.,
NX;j j
s^(p) 1
N XtXt+j j for 0 
 N ; 1
t=1

and s^(p) = 0 for j


j N . The method of moments spectral estimator is the peri-
odogram
X
N ;1 XN 2

Sb(p)(f ) = s^(p)e;i2f = N1  Xt e;i2ft : (B.3)
 =;(N ;1) t=1

The disadvantages of the periodogram are well documented see, for example, (Per-
cival and Walden 1993, p. 197). We will not concern ourselves with such matters,
except to point out the existence of one of several alternative spectral estimators {
the multitaper spectral estimator (Thomson 1982 Percival and Walden 1993, Ch. 7).
We introduce a set of K orthonormal data tapers fhtk j t = 1 : : :  N g, where k
P
ranges from 0 to K ; 1 i.e., Nt=1 htj htk = 1 if j = k and 0 if j 6= k. Examples
of common data tapers are the sine tapers (Riedel and Sidorenko 1995) and discrete
prolate spheroidal sequences data tapers (dpss) (Slepian 1978 Thomson 1982 Percival
212

and Walden 1993, Ch. 8). Sine tapers were designed to minimize the spectral window
bias and can be approximated well using the following closed form expression
  1  (k + 1) t 
= N 2+ 1
2
h(tksine) sin N +1 :

In contrast, the dpss data tapers minimize the spectral window sidelobes, using a
resolution bandwidth parameter W , and must be calculated using techniques such
as inverse iteration, numerical integration or a tridiagonal formulation (Percival and
Walden 1993, Ch. 8). The role of any data taper is to protect against leakage, and all
the sine tapers provide moderate leakage protection where the dpss data tapers oer
adjustable leakage protection through the parameter W . In practice there is little
dierence in the multitaper spectral estimators when using either data taper.
The typical multitaper spectral estimator is given by
N   2
bS (mt)(f ) = 1 X Sbk(mt)(f ) with Sbk(mt)(f ) = X htk Xt e;i2f  :
K ;1
K k=0  t=1 
Thus, the multitaper spectral estimator is the average of several direct spectral esti-
mators (more speci cally, eigenspectra) using an set of orthonormal data tapers. Mul-
titaper spectral estimators overcome several of the inadequacies of the periodogram
and possess reasonable bias, variance and resolution properties.

B.3 Equivalent Degrees of Freedom for a Spectral Estimator


We want to approximate the asymptotic distribution of our spectral estimate Sb(f )
using a distribution of the form a 2
because the true distribution is dicult to deter-
mine (see, e.g., Priestley (1981, pp. 466{468)), where the constants a and are found
by moment matching (Tukey 1949). Using the properties of the 2
distribution, we
know
n o   n o  
E Sb(f ) = E a 2
= a and Var Sb(f ) = Var a 2
= 2a2 :
213

Solving for a and , we obtain


 n o2 n o
2 E Sb(f ) E Sb(f )
= n o and a = : (B.4)
Var Sb(f )
If we express the spectral estimate as a lag window spectral estimator (Percival and
Walden 1993, Sec. 6.7), then we can rewrite Equation (B.4) as
2N "t and a = S ( f ) 
= R f
Ch ;f Wm2 ()d
(N )
(B.5)
(N )

where Ch is a constant which depends on the data taper used (see Table 248 in
Percival and Walden (1993) for values of Ch) and Wm( ) is the smoothing window.
Hence, the quantity is the equivalent degrees of freedom of the spectral estimator
Sb(f ).
Appendix C
BIVARIATE SPECTRAL ANALYSIS
The material presented in following sections closely follows an introduction to
bivariate spectral analysis in Percival (1994), and is a natural extension of univariate
topics found in Percival and Walden (1993) using similar notation. A more thorough
introduction to multivariate spectral analysis can be found in, for example, Koopmans
(1974), Priestley (1981) and Brillinger (1981).

C.1 Introduction
Let fXtg and fYtg be zero-mean weakly stationary processes with spectral density
functions (autospectra) SX ( ) and SY ( ), respectively. The cross spectral density func-
tion (csdf) of fXt Yt g is de ned to be
X
CXY e;i2f  ; 21  f  12 
1
SXY (f ) =
 =;1
where CXY is the cross covariance sequence (ccvs) given by
CXY = CovfXt Yt+ g = E fXt Yt+ g:
The complete spectral properties of a bivariate time series at frequency f can be
summarized by the spectral matrix
2 3
S ( f ) S (f )
S(f )  4 X XY 5: (C.1)
SY X (f ) SX (f )
Although this is not a symmetric matrix, there are numerous ways of expressing the
cross-diagonal terms (Brillinger 1981, p. 23) i.e.,
SXY (f ) = SXY

(;f ) = SY X (;f ) = SY X (f ):
215

Thus, the spectral matrix can be expressed in terms of three distinct quantities instead
of four 2 3
S ( f ) S (f )
S(f ) = 4 X XY 5:
SXY (;f ) SX (f )
Whereas the spectrum of a real valued process is real valued, since the autocovari-
ance sequence is symmetric about 0, the csdf (or cross spectrum) is usually complex
valued. This allows us to express SXY ( ) in Cartesian form as
SXY (f ) = RXY (f ) ; iQXY (f )
where RXY ( ) is the co-spectrum and QXY (f ) is the quadrature spectrum. It may also
be expressed in polar notation as
SXY (f ) = AXY (f )eiXY (f ) 

where AXY (f )  jSXY (f )j is the amplitude spectrum and XY ( ) is the phase spec-
trum. These new functions are at least real valued and may be more easily handled
than the cross spectrum. The complex coherency
wXY (f ) = p SXY (f )  (C.2)
SX (f )SY (f )
depends upon both the cross spectrum and the autospectra for fXtg and fYtg. The
complex coherency is a complex valued frequency domain \correlation coecient."
It measures the correlation in the random amplitudes assigned to the complex ex-
ponentials with frequency f in the spectral representations of fXtg and fYtg. The
quantity jwXY (f )j2 is called the magnitude squared coherence (msc) at the frequency
f . Thus, we have
jwXY (f )j2 = SjS(XY (f )j2 = A2XY (f ) 
X f )SY (f ) SX (f )SY (f )
that is, the msc is a normalized version of the square of the cross-amplitude spectrum.
The msc captures the \amplitude" part of the cross spectrum, but completely ignores
its phase, so the msc and phase spectrum can be used together to summarize the
\information" in the complex valued cross spectrum.
216

C.2 Spectral Estimation


Let Xt Yt t = 1 : : :  N , be a realization of a portion of a zero mean stationary
process fXt Ytg with cross spectrum SXY ( ) and autospectra SX ( ) and SY ( ), re-
spectively. Just as the periodogram was used in the univariate case (Section B.2),
the cross periodogram
(p) (f ) = X C
N ;1
bSXY bXY e;i2f
 =;(N ;1)

is utilized here to estimate the cross spectrum. The sample cross covariance sequence
is de ned to be
X
CbXY  XtYt+ 
t
where the summation goes from t = 1 to N ;
for
0 and from t = 1 ;
to
N for
< 0. The cross periodogram can also be written in a more computationally
friendly form as
!X
N ! !X
N !
SbXY
(p)
(f ) = N1 Xte;i2ft Yte;i2ft  (C.3)
t=1 t=1
where the asterisk denotes complex conjugation.
The multitaper estimator of the cross spectrum is given by
!X
N ! ! X
N !
b(mt)
SXY (f ) = 1 hktXte;i2ft hktYte;i2ft 
K t=1 t=1
where fhktg is the kth-order data taper for a sequence of length N normalized such
P
that t h2kt = 1 k = 1 : : :  K (c.f. Section B.2). Thus, the multitaper estimators
for the phase spectrum and magnitude squared coherence are given by
 (mt) 2
n b(mt) o  (mt) 2 SbXY (f )
^(XY
mt)
(f ) = arg SXY (f ) and  XY  Sb(mt)(f )Sb(mt)(f ) 
w
^ ( f ) =
X Y
217

respectively. The phase spectrum ^(XYmt)


( ) takes on values between ; and and,
hence, is modulo 2 . This can lead to discontinuities around  . Priestley (1981,
p. 709) describes a method to avoid these discontinuities { by simultaneously plotting
the original estimate and translated versions of it.
VITA
Brandon Whitcher was born in Carmel, California on August 1, 1971, and raised
in Yakima, Washington. He graduated from Carnegie Mellon University in 1993 with
a Bachelor of Science in Applied Mathematics (Statistics). He commenced studies at
the University of Washington in the fall of 1993, where he received a M.S. in Statistics
in 1995 and a Ph.D. in 1998. He is now working as a postdoc for EURANDOM, a
European research institute for the study of stochastic phenomena, in Eindhoven, the
Netherlands.

You might also like