You are on page 1of 14

JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 104, NO.

A5, PAGES 10,355-10,368, MAY 1, 1999

Time series,periodograms, and significance


G. Hernandez
GraduateProgramin Geophysics,
Universityof Washington,Seattle

Abstract. The geophysical literatureshowsa wide andconflictingusageof methodsemployedto


extractmeaningfulinformationon coherentoscillationsfrom measurements.This makesit diffi-
cult,if notimpossible,to relatethe findingsreportedby differentauthors.Therefore,we have
undertaken a criticalinvestigation of thetestsandmethodologyusedfor determiningthe presence
of statisticallysignificantcoherentoscillationsin periodograms derivedfrom time series. Statisti-
cal significance testsare only valid whenperformedon the independent frequenciespresentin a
measurement.Both the numberof possibleindependent frequenciesin a periodogramandthe sig-
nificancetestsare determinedby the numberof degreesof freedom,whichis the numberof true
independent measurements, presentin the time series,ratherthanthe numberof samplepointsin
the measurement.The numberof degreesof freedomis an intrinsicpropertyof the data,andit
mustbe determinedfrom the serialcoherence of the time series.As part of thisinvestigation,a
detailedstudyhasbeenperformedwhichclearlyillustratesthe deleteriouseffectsthatthe
apparentlyinnocentandcommonlyusedprocesses of filtering,de-trending,andtaperingof data
haveon periodogramanalysisandthe consequent difficultiesin the interpretationof the statistical
significancethusderived. For the sakeof clarity,a specificexampleof actualfield measurements
containingunevenly-spaced measurements, gaps,etc., aswell assyntheticexamples,havebeen
usedto illustratethe periodogramapproach,andpitfalls,leadingto the (statistical)significance
testsfor the presenceof coherentoscillations.Amongthe insightsof this investigationare:(1) the
conceptof a time seriesbeing(statistically)bandlimited by its own serialcoherenceandthushav-
ing a criticalsamplingrate whichdefinesoneof the necessary requirements for the properstatisti-
cal designof an experiment;(2) the designof a criticaltestfor the maximumnumberof significant
frequencieswhichcanbe usedto describea time series,while retainingintactthe varianceof the
testsample;(3) a demonstration of the unnecessary difficultiesthatmanipulationof the databrings
into the statisticalsignificanceinterpretationof saiddata;and(4) the resolutionandcorrectionof
the apparentdiscrepancy in significanceresultsobtainedby the useof the conventionalLomb-
Scarglesignificancetest,whencomparedwith the long-standingSchuster-WalkerandFishertests.

1. Introduction Sincethe introductionof the periodogramin 1898 [Schuster,


1898] tbr the investigationof hidden periodicities,this has
Most geophysicalmeasurements are typically recordedas a becomethe standardsearchfor statisticallysignificantcoherent
time series,and thesetime seriesare then analyzedfor the pres- oscillationsin time series. The study by Walker [1914] on the
ence of (statistically)significantoccurrences,later to be inter-
criteriafor statisticalsignificance,or reality,of theseperiodicities
preted in terms of the underlyingmechanisms.The literature
placedthe useof periodogram techniques on a soundstatistical
showsa wide and conflictingusageof methodsemployedto basis. Furtherinvestigations on thetestsof significancewere car-
obtain meaningI•l information on coherent oscillations from
ried out by Fisher [1929], who derivedthe exactprobabilitydis-
measurements. This makesit difficult,if not impossible,to relate tribution and showed that it is not necessaryto use the large-
the findingsreportedby differentauthors.The testsemployedin sampleasymptoticassumptions requiredin the earliertests.
the statisticalphaseof analysisarecentralto an investigation,and Periodogrammethodsare usedin a wide range of observa-
their propertiesand behaviorneed to be clearly understoodin
tional geophysicaldisciplines,as well as astronomicaland
orderto be properlyutilized. In this paperwe criticallyinvesti- meteorologicalstudies,of which the investigations of Scargle
gate statisticalanalysesof time seriesin termsof the sampling [1982] and Hamilton and Garcia [1986] are examples. The
characterof the time seriesproper,measurementuncertainties,
usage of periodogramtechniqueshas received contemporary
available degrees of freedom, and adverse effects that attentionsincetheir introductionin the NumericalRecipescollec-
(apparently)innocentdata manipulationcan producein the final tion [Press,et al., 1986]. However,unqualifieduseof the statist-
result. Specifically,this investigationdeals with periodogram ical significancetest provided by them is likely to provide
analysisandtesting. overoptimisticandincorrectconclusions.
The resultsobtainedfrom periodogramsignificanceanalysis
simplyprovideinformationon the (statistical)probabilitythatno,
or one or more, coherentperiodicitiesmay exist in a given data
Copyright1999by theAmericanGeophysical
Union. sample.This informationis thenusedin interpretingthe source,
properties,and behaviorof theseoscillations.The resultsfrom
Papernumber 1999JA900026. thepresentinvestigation highlighttheconditions necessary,along
0148-0227/99/1999JA900026509.00 with somepitfalls,in obtaininga realisticestimateof the statisti-

10,355
10,356 HERNANDEZ:TIME SERIES,PERIODOGRAMS,AND SIGNIFICANCE

cal significanceof periodicitiesexistingin a periodogram


derived
from a time series.

The tollowing sectionsgive the basic assumptions, deriva- j=N


bk=
j=N
•] Yysin(2nkT-•t/)
•] cos2(2nkT-•ty)
/=1 j=l

tions,andnotationnecessary for understanding andusingstatisti-


cal testson a periodogramderivedfrom a time series. The tech-
niquesusedfor handlingunequallyspaceddata,missingpoints, j=N -• ti) j=N
- • Yjcos(2rckT
/=l
• sin(2nkT-•tj)cos(2rckT
-•tj)
j=l

andunequallyweighteddataare alsopresented.Followingthese,
the periodogram,its statisticalproperties,and significancetests
are described.A concreteexampleis usedto illustratethe deduc- j:N /=l j=N
x • sin2(2rckT-•tj)
• cos2(2rckT-•tj)
j=l

tions. The effectsof dataserialcoherence (affectingthe degrees j=N


of fi'eedomavailable)in time seriesdata,and of tapering,filter- - [ • sin(2•kT
-• ty)cos(2rckT
-• t•) ]2 . (5b)
j=l
ing, and de-trending(all in the context of the periodogram
results) are also discussed.
When the lengthof the functionY(t) is an integermultipleof
theperiodT/k, fork = 1, 2 .....andtheti areequallyspaced,
i.e.,
2. Time Series when ti = At x j, whereAt = TM-•, M = 1, 2.....N/2,
j = 1, 2 ....N, and T = N, thenthe orthogonalityrulesapply:
In general,it is possibleto fit a time seriesY(t) with a func-
tion of the Ibrm:

Y(ti)
=Y/=k•:
iakcøs(2rckT-•tJ
)
,=N
• cos(2n;kr
1-I
-• ti) cos(2n;nr
-1tj) =
{TT/2n=k•:
0(6) n = k--0
k=l

+bksin(2n;kT
-Iti)], k=0,1,2..... (1) • sin(2n;kT
/:1
-1tj)sin(2n;nT
-itj) = 2n =k•0 (7)
n =k--0
By leastsquaresmethods,the generalbestfit (for any of the k j=N

coefficients)
is givenwhen< e2> is a minimum,
where • cos(2•ckT
-• tj) sin(2rcn/T
-• tj) = 0 . (8)
j=l

Thus,Equation(4a), Equation(5a) andEquation(5b) become:


j=N j=N

<•2> = • [yj _ at' cos(2:n;kT-•tj) • Yjcos(2nkr


-• t•)
j=l j=l
j=N
- bksin(2rckT-•t/)
]2, (2) • cos2(2•kr
-• tj)
j=l
and

=0. j=N
(3)
Oak 3b•. = 2T-• • Yjcos(2nkT
-•ti) , (9)
j=l

The solutionis straightforward,


theresultsbeing: j=N
j=N •] Yisin(2nkT
-• ty)
• Y/cos(2rckT
-•t/) b k _-
j=l
j=N
/=1
ak -- j=N • sin2(2n;kT
-• tj)
E COS2
(2rckT-•
tj) j=l
j=l
j=N
j=N
bk • sin(2n;kT
-1tj) cos(2n;kT
-1tj) 2T-• • Yjsin(2rckT
-• tj) . (10)
j=l
j=N , (4) j=l

• cos2(2rckT
-• tj)
j=N
Note, however,thatao andbo arespecialcases,where
j=N
• Yisin(2rckT
-• ti)
ao = r -• • Yj ; bo= O.
bk= j=l
j=N j=l

• sin2(2gkr
-1tj) Usually, the zero-frequencycoefficientis of no interest;so the
j=l
j=N
time seriesusedin this studyis thatobtainedby subtracting
the
ak • sin(2•ckT
-I ti) cos(2•ckT
-• ty) meanvalueof theoriginal
series,
i.e.,redefining
Yi = (YJ- ao).
j=l This operationis a simple redefinitionof the ordinateaxis and
j=N . (5a)
hasthe crucialpropertyof preservingthe variance.Implicit in
• sin2(2nkT
-•tj) thisstatement is thatthetimeseriesis a realizationof a stationary
j=l
process of at leastorder2 [Priestley,1981]. That is to say,it has
the samemeanandvarianceat all time points,andthe covariance
Equations (4a) and(5a) showthelackof orthogonality between betweenthe valuesat any two time pointsdependson the interval
the coefficients.This is not clearlynoticeablein the usualfull betweenthesetimepointsandnotthelocationof thepointsalong
derivationof thecoefficients,
suchasthefollowingforbk: the time axis.
HERNANDEZ:
TIME SERIES,
PERIODOGRAMS,
AND SIGNIFICANCE 10,357

Underthesecircumstances,
the ak and bk are orthogonalup to tieswhichassured
independence
of the resultantcoefficients
are
the Nyquist limit, i.e., K = kmax _<N/2. Becauseof their no longerapplicable.Theseproperties
mustbe investigated
independence andorthogonality,
thesefrequencies
areknownas independently.Therefore, it becomesnecessaryto use other
the naturalfrequencies.As waspointedout earlier,the original meansto obtain a measureof the true numberof degreesof free-
datapointsarerequired to be equallyspaced,
andimplicitlyeach dom in the data series. This topic will be discussedlater in Sec-
point has the sameuncertainty,qt- = constant.For the case tion 5. Implicit in the following discussion is the minimizingof
wherethe time seriesis not evenly spaced,possiblybecauseof the effectsof discontinuities,or edgeeffects,of real datasamples
missingdata [Little and Rubin, 1987], or the data pointshave having limited length. 'Tapering' the data with an appropriate
unequalweight, and the frequenciesdesiredare not commen- windowis oftenemployedto accomplishthis purpose[Blackman
suratewith the lengthof the data;then, harmonicfitting tech- and Tukey, 1959; Priestley, 1981; Percival and Walden, 1993].
niquesmust be employed.Under thesecircumstances, it is This topic will alsobe discussed.
possibleto approachthe normalleastsquaresFourierseries.
Specifically,
it mustbe mentionedthatin thepresentcontext,not 3. Periodogram
evenlyspaced dataaremeantto be irregularly-spaced
data.This
fittingtreatment is dueto Lomb[1976]. The weightassociated Schuster[1898] definedthe periodogramas a measureof the
withtheuncertainties
is defined
as% - of2, where
o] ½o•. relative power of a time seriesas a functionof frequency. He
In addition,an arbitraryphaseshiftterm•:•,is addedto eachof was searchingfor 'hiddenperiodicities',or small periodicvaria-
the trigonometrictermsin Equation(2a). This is a simplerede- tionshiddenbehindirregularfluctuations.Here our notationwill
finition of the axis, which does not alter the function. Thus: changeto the morecommonusageof to = 2•kT-•. The
j=N periodogramis defined[Priestley,1981]:
<œ2>= •, [yi _ a• cos(2nkT-•(tj-x•))
j=l
l(to) = Yicos(totjx • cos
2(tOt
i
L/=!
- bt.sin(2•;kT-•(tj-'rt.))]2 wi' (2')
+ Yisin(tot• x sin2
(toti , (13)
Treating Equation(2b) in the samemanneras Equation(2),
we obtainthe tbllowing result:
j=N
which, in termsof the previousderivations,can be written as
•, Yjwjcos[2•;kT
-• (tj-'r•)] 1(•0) =[A(•0)] 2+[B(to)] 2
j=l
ak = j=N i =N j=N

• wjcos2[2nkT
-• (tj-'r•)] = a2(to)• wicos2(tot•)+b2(to)
•_•wjsin2(totj).(14a)
j=l
/=l j=l

j=N For the case where the uncertainties of the data are the same
b• •, wisin[2•;kT
-• (tj-x•)] cos[2•;kT
-• (tj-x•)] (i.e.,w/ = constant)
for all valuesandthedatapointsareevenly
1=1
(4') spaced,it is easyto showthat:
• wjcos2[2nkT-•(ti-xk)] l(to) = N2-1[a2(to) + b2(to)] = N2-tc2(to). (14b)
j=l

If we arbitrarilyforce the numeratorof the secondterm of This equationshowsthe 'multiplicative'effectof the periodo-
Equation(4b) to be zero, we havethat: gram [Priestley,1981). In thissectionit will be implicitlyunder-
I=N
stood that the coefficients a(tO) and b(to), and their associated
= O. (11) quantitiesA(to) and B(to), are independentbecauseof their
b• •,wisin[2•;kT-I(tj-x•)]cos[2•;kT-•(tj-x•)]
j=l orthogonalproperties and/or their independentlydetermined
It is thenpossibleto solvefor x •' degreesof freedomof the time series.
j=N
For the null case,wherethe Y• consists
of a sequenceof
independentrandomvariables 2 the
of zeromeanandvarianceOy,
• wisin(4nkT
-• t•)
setof A(to) and B(to) of Equation(13) is a linear combinationof
xk= T(4nk)
-• tan-• •=2v
•=• ., (12) theYi setandhasa multivariate
normaldistribution.
Thesetof
• wjcos(41rkr
-t tj) l(to) hasa distribution
whichisproportional
toZ2 in 2 degrees
of
j=l
freedom. Thus:
2
which,whenreplacedin Equation(4b), givesthe desiredanswer: Iv(co)= OyZ2
2, (15)
j=N

•_•w•Yjcos[2•rkr
-• (tj-'rt,)] wheretheZ22
distribution
is a simpleexponential
distribution
hav-
j=l ing a meanv andvariance2v. However,notethatthisis applica-
at.= j=•/ . (9') ble only to the (maximum)numberN/2 of independent
I(to), i.e.,
•, wjcos2[2•kr
-• (tj-xt.)] natural frequencies. Schuster [1898] tested the largest
j=l
periodogram ordinatewith the statistic¾:
Althoughit is possibleto obtainestimatesof the variancefor
the determinedcoefficients,this doesnot directly give a measure
5'= (Ip)max
O.7,2; l<p<N/2. (16a)
2
of the importanceof a particularfrequencyin the spectrumrela- In practicalapplications
the variance0 v mustbe estimated,
tive to the otherpossibleindependent frequenciespresent.Also, preferablywith an unbiasedestimatesuchas the expectation
if a fitting method is used becauseof the irregular spacing, value,ratherthanwiththesample
variance
sy.2 Forcompleteness,
unequalvariance,etc., in the data,then the orthogonalityproper- whenthe sample
variance
is replaced
by 2s•, the resulting
10,358 HERNANDEZ:TIME SERIES,PERIODOGRAMS,AND SIGNIFICANCE

expressionof ¾ can be recognizedas the conventionalLomb- geometricalarguments,for N odd, later analytically proven by
Scargle(LS) periodogram
statistic[Presset al., 1986],e.g., Grenanderand Rosenblatt[ 1957]. Fishershowedthat the proba-
2 -1 bility of the power at one frequencyover the total power of the
¾L•= (Iv)max(2 Sy) ß l_<p_<N/2. (16b) setcanbe expressed(relativeto the arbitrarylevel z) by:
Theretorethe statisticaltest consistsof checkingwhether
or not the valueof ¾differs/¾omzero at somesignificance level
p[g>z]='•
i=•(-1)i-•n!(1-iz)"-•
i! (n-i)! ' (22)
fortheZ22
distribution,
i.e.,whether
ornotallIp = 0. Theproba-
bilitydistribution
of a Z22
is a simple
exponential [Hoel, wherea isthenextlargest
function integer
greater
thanz-• . Thefirstterm
1954]' of the expansion of Equation(22) can be recognized as the
expansionof a simpleexponential,suchthat for large valuesof n
f(z) = 2-l exp(-z/2) . (17) (n >> 10,000)'

Hence,
foranyvalue
ofz > 0,theprobability
thatI•,/o•.
2,does
not p[g* > z] = 1 - [1-exp(-z/2)]" , (23)
exceedz is givenby:
which is the sameresultobtainedin Equation(18b).

p[(I•,/o•)
5z]=i f(x)dx
=1-exp(-z/2)
. (18a) The critical value z is the highestvalue in the periodogramto
be tested. For the approximationof Equation(23), at somecriti-
cal value (P,,) of the confidencelevel probabilitythe asymptotic
Under the null hypothesisthat ¾ representsone of the N/2 valueof z to be exceededat thatlevel is (seeEquation(19)):
independently
distributed
exponentialvariables,thenfor anyz:
z•. = -2 ln[1-(Pc) l/" . (24)
p(¾> z) = 1-p[(I•/•)<z, forallp]
However,
Whittle[1952]hasproposed
thatthequantity
g• could
- 1-[ 1-exp(-z/2)] •v/2. (18b) be usedto continuethe significancetestto lesseramplitudepeaks
This makesit possibleto testwhetheror notthe largestvaluein a thanthe highest:
periodogramis statisticallydifferent from a zero mean distribu- , It,=2
2
tionwithvariance
oy. If sucha nonzero
peakexists,thedistribu- g2= i,=. , (25)
tion is unlikely to be random,and this is the end of the test. To N-•[ (• II,) - //,max]
be significant,¾mustexceedthe critical test value of: p=l

¾•,= - 2 In( 1 - P•,/" . (19) whereI•,=2is thenextsmaller


peakafter(Iv) m•x.Thisapproach
canbe continuedto the next peak,etc., as long as the probability
wheren = N/2, or the possiblenumberof naturalIYequencies, is bothstatisticallysignificantand lessthanthe probabilityof the
andP,, is definedastheconfidence levelprobability,1 - p. How- earlierpeaks. The appropriatechangesto the value of n in Equa-
ever,thisimpliesthatthesample variance(s•) is an unbiased tions(22) and (23) are alsorequired. This procedurewill give an
estimate ofthevariance (o•). A better unbiased estimate ofthe estimateof K, the maximumnumberof periodiccomponentspos-
variancecan be obtainedwith the Parseval-Rayleigh's theorem, sible in Equation(1). An objection can be raised to Equa-
thatis, the poweris equalin the time andfrequencydomains' tion (25), in that the denominatordoesnot preservethe variance
p =N/2 t =N of the sample. This can be remediedby our defilainga new
$; 5, = Z (20) quantitygm
p=l t=l

Accordingly,the expectationvalue of the left sideis (for a zero , II•=m


gm= p = N/2 ß (26)
mean value data series):
p = N/2
2v- ( I,,)
p=l
< 7; 5, > =
p=l In this modified expression,m representsthe lesser amplitude
2 peaksafter the first one. In effect, this modificationstatesthat all
Therefore,
placingthe aboveunbiased
estimateof •:, into thosepeaks above the critical levels of Equations(19) and (24)
Equation
(15a),wedefineg* andg:
are to be consideredto be statisticallysignificant. It also givesa
measureof K, the maximum numberof periodicterms in Equa-
g*=(I,)max/O•
ß = (Ip)max
p=N/2 = gN. (15') tion (1).
N-• •I•,
p=l
4. Real Data Example
For large N, when the samplingfluctuationsof the denominator
arenegligible,
theng* will asymptotically
havethesamedistri- The useof periodogramtechniquesto assessthe statisticalsig-
bution as ¾ of Equation(16a). This asymptoticdistributionis nificanceof the periodicitiesthat may be presentin a time series
Walker's[1914]largesampletestfor (I•,)tax,withthe same is bettershownusinga concreteexample. Specifically,an actual,
result as given in Equation(19). One shouldnote that as the noisy, irregularly spaced,real world experimentalsample with
averagepower of a time seriesincreases,as providedby the missingpoints, has been used for this illustration. This sample
denominator of Equation (15b), the valueof the statisticg* consistsof measurementsof wind at 92 km heightby a medium-
decreases. This result is indifferent as to whether the increase in frequency(MF) radar at ScottBase,Antarctica,during 11 daysin
poweris dueto thepresence of noiseor of signal.In fact,a spec- August 1996. For the sake of convenienceand generalization,
trumtoo rich in signalsasymptotically reaches the statistical
pro- the time scale of these measurementshas been changed from
pertiesof a noisespectrumand limits the amountof significant hoursto seconds.The route of using typical real data has been
information that can be obtained from it. taken becausethe resultswere not known, thus avoiding their
Fisher[1929] derivedan exacttestfor g of Equation(15a)by beingprejudicedas they wouldbe in a syntheticexample,aswell
HERNANDEZ: TIME SERIES, PERIODOGRAMS, AND SIGNIFICANCE 10,359

asrepresenting the usualcasean investigatormustface. This real


data sampleis shown in Figure 1. It goesalmost without saying 20
that many a syntheticexamplehasbeenusedto develop,test,and
refine the numerical proceduresemployedhere. Detailed results Fisher-Walker 0.95
15

of the methodologyusing syntheticexamples is given in the Fisher


Appendix. Synthetic exampleshave been used sparinglyin the 10

main text and only when deemednecessaryfor clarity of presen-


tation(seeFigure 13). 5

Figure2 illustratesthe periodogramfor the data. The original


data havebeen given the unequalspacingLomb [ 1976] treatment
describedearlier and, for the time being, it is presumedthat the
20
time series measurementsare independent (in the statistical
sense). As noted earlier in the time seriessection,in the process 15
Schuster-Walker 0.95

of calculatingthe periodogramsthe time serieshas beenrendered Schuster

stationaryby subtractingits mean value. Figure 2 (bottom)illus- 10

trates
theSchuster
g2•distribution
significance
testresults
andthe
conventional Lomb-Scargletest[Presset al., 1986], utilizingthe 5

samplevarianceas an estimateof the true variance,while Fig-


o
ure 2 (top) gives the Fisher-gsignificancetestusingthe power as 0 0. i 0.2 0.3 0.4

an unbiasedestimateof the variance. As the reader can readily Frequency(Hertz)


see,the measuresof statisticalsignificanceobtainedby the Fisher Figure 2. Periodogram of thedataof Figure1, wherethe statisti-
and the Schuster-Walker methods are in excellent agreement, cal significancelevels are shownfor the (top) Fisher-g[Fisher,
while the conventionalLomb-Scarglemethod of testing signifi- 1929] test and (bottom)both the Schuster[1898] and the conven-
cance [Press et al., 1986] showsstrongdisagreementwith the tionalLomb-Scargle
[Presset al., 1986]tests.The naturalfre-
other two. This discrepancywill be addressedat the end of this quenciesare shown.
section. The significancetestsshownhave been made only for
the largestperiodogramordinate. In Figure 2 the naturalfrequen-
cies have been calculated,presumingthat the longest period
representsthe fundamentalfrequencyand the shortestperiod is cies. Alter all, a priori, we do not know the real frequencies
twice the average separationbetween points. When the fre- which are presentin the data. This redefinitionof the fundamen-
quency of a 'hidden' periodicityfalls between a pair of the tal frequencycan be achievedby croppingthe data down to a
natural frequencies,its amplitude is substantiallyreduced. For suitablelength. This is the approachwhichhasbeentakenin the
determinationof the critical statisticallevel of Figure 3 (top).
the extremecase,when the soughtafter periodicityfalls midway
betweentwo neighboringordinates,the height of such a hidden The significancetests in Figure3 (bottom) are the Schuster-
peakisreduced
by4 n-2 [Whittle,1952]. Walker and the conventionalLomb-Scargle[Presset al., 1986]
The obvioussolutionto the apparentgraininessin the resultsis levels,respectively.For consistency of presentation,the previous
to have measured more points in the original data, i.e., more resultis shownagain.
independentfrequencies. However, this is not always possible. Since the conventionalLomb-Scarglestatisticalsignificance
Another approach,which is numerically correct, is to calculate test [Press et al., 1986] of the periodogramgives substantially
different results than the Schuster-Walker and Fisher tests, the
the periodogramamplitudesat closerspacingthanthe naturalfie-
quencies.This approach,shownin Figure3, illustratesthe possi- reason for this discrepancyrequires further investigation.
Becausethe three tests use the same basic quantities,one must
ble frequenciesavailable;but the independence necessaryfor the
look into the derivations of these statisticalsignificancetests.
statisticaltestsis lost, and the statisticaltestscan no longer be
The derivations for the Schuster-Walker and Fisher methods have
made. However,basedon this artifice,it is possibleto redefine
the fundamentalfrequencyso as many as possibleof the desired
been given previously;thereforethoseusedin the conventional
frequenciesare includedin the correctnaturalharmonicfrequen- Lomb-Scargle will be examinednow. FollowingKendall[1948],
the variance of a, of Equation(9a), for equally weighted and
equallyspacedpoints,canbe shownto be equalto:
j=N

S: = 40•T-: •cos:(2nkT-lt•) = 2o•T-•


at . . ' (27)
j=l

1(x) and the variancefor b k is similarly obtained. Note that in the


presentderivations,T = N. From the definitionof the normal
distributionone finds the frequencyfunctionfor two independent
Signal variables to be:

dF = [O(2n)l/2]-lexp[-x2(202)-l]dx
-50 2 1/2 -
=(N-14noy)
- exp{-[a•+b•](4o•/N)l}
0 50 100 150 200 250
x dakdb•. (28a)
Time (seconds) From the definitionsgiven in Equations(14b) and (21) one can
2
Figure1. Noisyandunevenlyspaced datachosen recognize,usingthe differentestimatesof c•v, that the argument
experimental
lbr examination
of hiddenperiodicities. of the exponentialis'
10,360 HERNANDEZ: TIME SERIES,PERIODOGRAMS,AND SIGNIFICANCE

[1986] approach.As notedearlier,the varianceSy 2 used in the


conventionalLomb-Scargle expression of Equation(16b) will not
necessarily
havethesamenumerical valueas 2 derived from the
expectationvaluein Equation(21), because of samplingfluctua-
Fisher 15
' Fisher-Walker
0.95 tions. Theretorea simplecorrectionto the Presset al. [1986]
conventionalLomb-Scarglestatisticis not the answersince,for
typicalsmalldatasamples, the Lomb-Scargle statisticcalculated
5 usingthe two differentapproaches will not necessarilyhavethe
samevalue. In addition,the Lomb-Scargleapproachin the Press
o
et al. [1986] recipe incorrectlyimplies that any numberof
20 periodogram frequencies are possible.It is for this reasonthat
theappropriate(naturalfrequency) statistic
of Equation(28c) has
Schuster-Walker 0.95
15 beenusedhere, and it is stronglysuggestedfor use.
Schuster
Becauseof the nearequalityof thethreestatisticalsignificance
l0
testing methods (Schuster-Walker,Fisher, and the corrected
ConventionalLomb-Scat;le 0.95
Lomb-Scargle),the more exact Fisher [1929] methodwill be
5
usedin the following text. It shouldbe notedthat regardlessof
o
which significancetestis employed,the problemof true degrees
o o. 1 0.2 0.3 0.4
of freedom,which define the actual numberof independentfre-
Frequency (Hertz) quenciespossible,hasto be faced.
Figure 3. Same as Figure 2, exceptthat many more frequencies
thanthe naturalfrequencieshave beencalculated.
5. Degreesof Freedom
As remarked earlier, the sample variance is used in the
Schuster-Walkerand conventional Lomb-Scargle method as a
[a•+b•l(4o•/N)-' = I,,/(2o•) measure
of the truevariance
in orderto carryouttheZ22test,
p = N/2
while the Fishertest usesthe poweras an unbiasedestimatefor
= Iv(2N-• • Iv)-1 , (28b)
the variance.The samplevarianceis normallyan optimisticesti-
p- I
mate of the true variance,as it usesthe numberof datapoints(N)
where the last term is half the value given in Equation(15b). A
in the sample,ratherthan the numberof degreesof freedomv
reader can recognizethe center expressionas the conventional
[Priestley,1981]. Becauseof serial coherencein the original
2 is replacedby sy2 [see
Lomb-Scarglestatistic,when c[•, data,the numberof pointsis seldoman appropriateestimateof
Equation(16b)]. Using the best estimate of the variance, the
the true numberof degreesof freedom,as the samplesmay be
appropriateLomb-Scarglestatisticto testis, therefore:
drawnat time intervalstoo shortto be independent[Leith, 1973;
p =N/2
Harrison and Larkin, 1997]. Of course, as the sample size
It,(2N-• • lp)-• < q , (28c) becomeslarge,the samplevariancewill asymptotically reachthe
p=l

whereq is an arbitrarynumber. FollowingScargle[1982], when


the a k and the bt, of Equation(28a) are not orthogonaland
independent,they do not have the samevariance,and the proba-
bility is a simple exponential. That is, the probabilitythat the 20

statisticwill notexceedthisvalueis P =(1-e-n) N/2for all the


0.95
possiblefrequencies.From this, it is simpleto showthat: 15

Schuster
Yl,. = - In( 1- P2m) ,
which is to be comparedwith y•. of Equation(t9) andz•. of Equa-
tion (24) and is the resultgivenby Presset al. [1986] for the con-
(29)
5
10

,/
ventional Lomb-Scargle statisticalsignificancetest. Since the
statisticin Equation(28c) and the test of Equation(29) are half
the value of the relevantstatisticand test in Equations(15b) and
(24), the resultantsignificancetestsare seento be the sameas the 0.95

other two tests,albeit with a scale change. The resultsof the sig- Lomb-Scargle 7.5

/
(corrected)
nificancetestjust describedare shownin Figure4, indicatingthe 5

equivalenceof thesetwo periodogramtests. It shouldbe pointed


out that this factor of 2 discrepancyappliesstrictly to the (now 2.5

corrected)Lomb-Scarglestatisticcalculatedusingthe expression
0
of Equation(28c). o 0.1 0.2 0.3 0.4

This approachdoesresolvethe discrepancybetweenthe con- Frequency(Hertz)


ventional
Lomb-Scargle
andboththeSchuster-Walker
andFisher Figure4. Statisticalsignificance
of theSchuster-Walker
andthe
significancetestsillustratedin Figures2 and 3. Note, however, correctedLomb-Scargleperiodogram testmethods.The statistic
that it has now become necessaryto calculate the complete of Equation(28c) wasemployedfor theLomb-Scargle test. Note
natural frequency periodogramin order to obtain the correct the equivalentresultsfor the two methods.The naturalfrequen-
Lomb-Scargle results, which is not the case in the Press et al. cies are shown.
HERNANDEZ:TIME SERIES,PERIODOGRAMS,
AND SIGNIFICANCE 10,361

20
true variance. Thus, in particularfor small samples,it becomes
necessaryto obtain an estimateof the number of the sample's
degreesof freedom in order to have a proper estimateof the true 15

variance. An estimateof the numberof degreesof freedom can 0.95


be obtainedfrom the data by deriving the time betweenindepen- Fisher 10
dent observations.This requiresa knowledgeof the autocorrela-
tion of the data under consideration,which convenientlyis avail-
5
able as the Fourier transformof the already determinedpower
spectrumor periodogram. A simpleFourier transformationof the
latter spectrumprovidesthe desiredinformation. Figure5 shows 0 0. I 0.2 0.3 0.4
the autocorrelationof the experimentaldata of Figure 1 near the
Frequency (Hertz)
first zero crossing.
The two common techniquesto estimate the time between Figure6. Periodogramfor the natural frequenciesavailable
when only the true numberof degreesof freedomof the time
independentobservations[Harrison and Larkin, 1997] are the
seriesof Figure1 is takeninto account.Samefrequencyscaleas
useof the first zero crossingof the autocorrelation
of the dataand
Figure2 is usedin orderto emphasize the true informationcon-
the Leith [1973] technique.The Leith methodshowsthat a data tent of the time series. See text for details.
serieswithvariance(•2,whenfilteredwitha running meancon-
tainingM points,is expected
to havea variance(•}t = (•2 v-1.
Here v is the effective numberof independentpointswhich are
averagedtogether. The expressiongiven by Leith [1973] and quencyplane transformsinto a productof a sinc function,i.e,
reportedby Harrison and Larkin [1997] for a sampledseriesis: sinc(x) = [sin(rex)]/(rex) [Bracewell, 1965], with the amplitude
function. This can be trivially extendedto the periodogramordi-
nate. As given in Equation(21), the sum of the power is an
(5• = (•2v-1 = (•2M-• I + 2 • [l-nM-1]r,, (30) unbiasedestimate of the variance; thus the sum of the power
n=l

obtainedafter the runningmeanaveragingwill be smallerby the


where r,, are the elementsof the autocorrelationof the data. The number of independentpoints which are averaged together.
time between independent observationscan then be derived Therefore, the ratio of the power before applicationof the run-
[Harrison and Larkin, 1997]: ning mean operationover the power after the runningmean will
be a measureof the number of these independentpoints in the
'c= Tv -• = 1 + 2 • [1-nM-1]r,. (31) Leith [ 1973] sense.
n= [
Application of these methodsto estimatethe time between
independentobservationsin the presentdata set, usingthe auto-
The numberof degreesof freedomin the data is equal to the data correlationresultsshown in Figure 5, gives coherencetimes of
length T divided by the time betweenindependentobservations. 3.5 s and 4.44 s for the zero crossingand Leith [1973] methods,
Estimationof the time betweenindependentobservations by the respectively.These coherencetimes translateinto 73 and 58
first zero crossingoccurrenceis an approximate,safe, and, usu- degreesof treedom,which are to be contrastedwith the original
ally, conservativeoverestimate. The Leith [1973] method is a 235 pointsin the data sample. Thereforethe criticallevel usedin
more direct estimate and it also lends itself to a confidence inter- the significancetestshasbeenoverestimated for all the methods.
val calculation [Harrison and Larkin, 1997]. It shouldbe noted Moreover, the limited number of degrees of freedom has
that the autocorrelationof a time series consistingof random automaticallylimited the number of independent(natural) fre-
noise, i.e.,havingzero-mean amplitude withvariance (•2andran- quenciespossiblein the periodogram.This has also automati-
dom phase,will have its zero crossingnearthe first lag. Then the cally lowered the value of the Nyquist frequency. This result
degreesof freedomare thereforeequalto N, the numberof points showsthe fundamentallimitationpresentedby the existenceof a
in the sample. finite serial coherencetime, namely, that the highestindependent
A third methodto estimatethe numberof degreesof freedom periodicityobservablein a time seriesis roughlytwice this serial
can alsobe directly derivedfrom the power spectrumof the time coherencetime. This resultis independentof the lengthand the
series.Here one makes use of the propertythat a running mean finenessof the time grid of the time series. Any measurements
operationis a convolution in the time plane, which in the fre- made closer than the serial coherencetime simply lead to over-
sampling•and no new informationis acquired.
Figure6 gives a new calculationof the periodogramfor the
real data of Figure 1 usingthe 29 frequenciesactuallyavailable.
0.8 Besidethe coarseness of the spectrum,the naturalfrequenciesno
0.6
longer coincidewith the spectralfeatures,and the frequency
rangeis considerablysmallerthanseenin Figure2. This is a typ-
Autocorrelation 0.4
ical exampleof the limitationsfound in real world data, where
the amountof informationavailablefor properstatisticaltesting
O.2 • • is generallysmall. One shouldremarkthat the finite amountof
_
this informationis an intrinsic and fundamentalpropertyof the
5 15 25 data,not of any processwith which they are handled.
Lag (seconds) Earlier, we discussedthe method of redefinition of the funda-
Figure 5. Autocorrelationplot of the time seriesgiven in Fig- mental frequencyas a meansto include desiredfrequenciesas
ure 1. Only the relevantinformationnear the first zero crossing harmonics of this fundamental period. Figure7 shows the
is shown. improvedresultsobtainedby doing suchan adjustment.Notice
10,362 HERNANDEZ: TIME SERIES, PERIODOGRAMS, AND SIGNIFICANCE

wherex i is the abscissa


andN is the numberof points. Figure8
2O
showsthe windowandits productwith the originaldatagivenin
Figure 1. The decreasein amplitudeat both endsof the data is
noticeable.It mustbe remarkedthat diminishingthe amplitude
Fisher 0.95 of theoriginalinformationinevitablythrowsawaysomeinforma-
1o tion.

Since the shapeof a Welch window is fixed, it is useful to


5 havea windowwhoseslopeand one-halfpowerpointsare arbi-
trarily selectable.A hyper-Gausswindow functionis a good
0 0.1 0.2 0.3 0.4
exampleof a variableshapewindow:
Frequency (Hertz) xx = exp(x/G)TM ß G = x•/2/(ln2)-•/2t• . (33)
Figure 7. Periodogramwith the numberof naturalfrequencies
Here x•/2 is the one-halfpower point and M is the (integer)
allowed by the availabledegreesof freedom. The fundamental
frequencyhasbeenredefinedby croppingthe originaldata8.1% numberwhich definesthe slope. The value of M is foundby
in its timelength.Comparewiththeperiodogram in Figure6. arbitrarilydefining a value of x where the function has an arbi-
trary/¾actionaltransmission relativeto the peakvalue. An exam-
ple of this windowis illustratedin Figure9. As canbe seen,by
theappropriate choiceof one-halfpowerpointandlargevaluesof
the changein the powernear0.1 Hz. Theseresultsare obtained M, this windowwill asymptotically approacha rectangular win-
by croppingthe databy a smalladjustment(8.1%) in theirtime dow. This hypotheticalwindowbecomesequivalentto the sam-
length. Needlessto say,thisprocess
cannotbe carriedout more plingwindowusedto choosethe sampleseries,i.e., no tapering.
than once,as the resultingdetermined frequencies
wouldno The hyper-Gauss windowis usethlsinceit hasno ringingof its
longerbeindependent. Again,notethatthecriticallevelof signi- own to contributeto the final spectrum.By adjustingthe one-
ficancein Figure7, relativeto Figure2, hasdecreased
because of half powerwidth andthe valueof M so that ringingis minimal,
the smallernumberof degreesof freedominvolved. one can optimize/br a minimumlossof data when usingthis
window.

The numberof availabledegreesof freedomin the time series


6. Tapering
shownin Figures8 and9 is relativelyunchanged fromthatof the
In Section2 we mentionedtheuseof 'tapering'of a given originalseries,namely57 and56 for the Welch andhyper-Gauss
timeseriesin ordertominimize
edgeeffects
of thelimitedlength windows,respectively.The periodogramsderived from these
of thedataseries.Thisoperation
consists
of reshapingtheorigi- dataare givenin Figures10 and 11, respectively, wherethe time
nal timeserieswitha smoothandsymmetrical
'window'function serieslengthhasbeencroppedsothatthedatafrequencies match
whichsmoothly
decreases,
ortapers,
fromunityvalueatthemid- the periodogramnatural tYequencies.These periodograms are
dle to a zerovalueat the extremesof the data. In addition,the similar to the periodogramgiven in Figure7, exceptfor the
application
of thissymmetrical
taperwindowenhances thecycli- apparent increasedstatisticalsignificance
at the lowerfrequency.
cal behaviorimplicitly presentin Fourier analysis. 'Pre- This is specially obvious in the Welch window case. Closer
whitening' or the controlledadditionof noisein order to decrease examination,by calculatingsyntheticspectra,showsthat the use
thedynamic
rangeof a dataset[Blackman andTukey,1959;Per- of a taperingwindowchanges theperiodogram powerdistribution
civalandWalden,
1993],is oftenusedin conjunction
withtaper- as a functionof frequencyand phase. This powerdistribution
ing. changeis noticeableas an enhancement of the relativepowerin
Tapering can be better understood in terms of Fourier the first few harmonicterms,typicallyup to the tenthterm for a
transforms,
wherethetaperingconsists
of themultiplication
of Welchwindow. Althoughthe useof thesetaperingwindowshas
thetimeseriesby thewindowfunction
in thetimeplane.This not noticeablyalteredthe numberof degreesof freedom,it has
multiplicationtranslatesinto a convolutionof the two transtbrms changedthe distributionof power with frequency,leadingto
of thesefunctions
in thefrequencyplane[Bracewell,
1965]. The
convolutionprocesshastheeffectof broadening
(andsmoothing)
theresultant
frequency planefunction.Thereforetheoscillatory
characterof the transform of the time seriesfunction which has 1.0
beensharply
cropped at theedges(i.e.,havinga limitedlength) 0.5

becomesspreadoverfrequency,andlessnoticeable, by thecon-
volutionoperation.Detailsontaperingarefoundin moststatisti- 0

cal references
[Blackman andTukey,1959;Priestley,1981;Per-
100
civalandWalden,1993],wheretheyarediscussed in termsof the
(originalfunction)Fejerkernel,its manipulation,
andthe resul- 50
tanteffectsin the analysis.Here we will examinethe effectsthat
thetaperingprocess
hasonourdatasample. Signal o
As expected,thereexista largenumberof windowfunctions
and figuresof merit associated
with them [seeBlackmanand -50

Tukey, 1959; Priestley, 1981; Press et al., 1986; Percival and


Walden,1993], and they all accomplish
aboutthe sameresult 0 50 100 150 200 250

[Presset al., 1986].A recommended


[Presset al., 1986]practi- Time (seconds)
cal window is the Welch window: Figure8. Welchwindowandtheoriginaltimeseries dataof Fig-
ure1 aftertheirmultiplication.
Notethelossof amplitude
in the
x.,•,= 1 - xi-xl-Xc)/X c ' Xc = (X•v-xl)/2, (32) dataasthe edgesare approached.
HERNANDEZ: TIME SERIES,PERIODOGRAMS, AND SIGNIFICANCE 10,363

S -

Gauss

Fisher 0.95

100

Signal
() 0.1 0.2 0.3 0.4

Frequency (Hertz)
Figure 11. Periodogramusingthe numberof naturalfrequencies
½) 50 1(X) 150 200 250 allowed by the available degreesof freedomwith a hyper-Gauss
Time (seconds) window. As in Figure7, the fundamentalfrequencyhas been
redefinedby croppingthe datalength.
Figure 9. Hyper-Gausswindow and the original time seriesdata
of Figure 1 after multiplicationby this window.

However, the previouslydiscusseddifficulties with the use of


resultsthatincorrectlyappeareitherto containmorestatistically taperwindowsremain.
significantfeaturesthan were presentin the original data or to
enhancethoseat the lowerfrequencyregion.
7. Filtering and 'De-trending'
Pre-whitening afterdatahavebeenobtainedis to mainlyavoid
difficulty with the minor lobesof the spectralwindows[Black- In the context of the periodogrammaterial previously dis-
man and Tukey, 1959]. Effectively, the desiredresult is to cussed,filtering amountsto the enhancementof a region of the
changethe originalprocessdistribution so that it hasa 'nearly spectrumrelativeto the restof the spectrum.This filtering opera-
white' spectraldensityIhnction [Priestley, 1981]. This can be tion changesthe relativepowerof a given featurein the spectrum
accomplished by suitablyfilteringthe signal,whichmay include and doesnot accountfor sucha change. At the sametime, filter-
the additionof white noiseto the originalinformation.Filtering, ing can changethe serial coherencetime of the data under con-
in this context,is understood to be a changeof the relativecontri- sideration.As wouldbe expected,the periodogramof suchmani-
butionsof the differentportionsof the spectrum.Dependingon pulated data would be quite different from that periodogram
the characterof the (arbitrary)filteringappliedto the data,pre- obtainedfrom the original data. However, it is simply invalid to
whiteningcan eitherreduceor increasethe serialcoherenceof the assignto the original data the resultsof significancetestsper-
data,whencomparedwith the originalsample. Associatedwith formedon data whosepower spectrumhasbeenarbitrarilymani-
suchmanipulationis the consequent increaseor decreasein the pulated. In fewer words,the testsfor statisticalsignificanceare
degreesof freedom. As will be seenin Section7, sucharbitrary no longer being made on the original data and thereforeare not
relevant to them.
changesto the characterand magnitudeof the samplevariance
lead to unpredictable results,whichcannotbe statisticallysup- A simpleexampleof filtering is the use of a band-passfilter
ported. which selectsa small region of a power spectrumand rejectsall
A secondary,
but interesting,propertyof strongand symmetri- otherintbrmationbelow and abovethe selectedregion,as shown
cal taperwindows,suchastheWelchwindow,is thattheytendto in Figure 12. The selectedregion still only containsthe same
enhancethe cyclicalbehaviorimplicit in Fourier analysis. By intbrmationand the samestatisticalsignificanceit had when the
enhancingsuchbehaviorof a time series,a givenrealizationcan full original spectrumwas available,and no new informationis
sometimesapproachstationarityto at least order 2. This is a gainedby this filtering. In fact, if anything,bothinformationand
necessary
requirementfor the statisticalinvestigationof the data. degreesof treedomwould havebeenlostby the useof the filter-
ing process.Filtering,suchasthe processdescribedhere,ignores
the central thesis of significancetesting; these tests clearly
25
specifythatthe significance of any givenfeature'spowermustbe
Welch
20
measuredrelative to the rest of the power spectrum,as given in
Equation(15b). Arbitrarilydisposingof (or ignoring)the power
15 associatedwith the rest of the spectrumas undesirablesimply
Fisher 0.95 invalidatesany significancetestsperformedon the filtered data.
10 In other words,the statisticalsignificancetestsare not being per-
formedon the originaldata and are no longerrelevantto them.
The resultsshownin Figure 12 can be obtainedby eitherthe pro-

5•
•xf/•I I I I I I I
0 0.1 0.2 0.3 0.4
ductof the original spectrumandthe filter transferfunctionor by
only calculatingthe spectrumfor the region of interest, which
Frequency(Hertz)
requirescalculatingthe full powerspectrumanyway,as given in
Equation(15b). The resultsobtainedby eitherof thesemethods
Figure10. Periodogram usingthenumber of natural
frequencies
allowedby the availabledegreesof freedomwith a Welch win- are indistinguishablefrom eachother.
dow. Note,in particular,
therelativeenhancement of thepower Filteringis a conceptwhichis more applicableto the scenario
at thelowerfrequency,whencompared to Figure7. Here,asin in which data are being obtainedcontinuouslyand the signal-to-
Figure7, thefundamental frequency hasbeenredefined by crop- noise ratio is continuouslyincreasing. After some time has
pingthe datalength. elapsed,an arbitrarycoherentfrequencypresentin the informa-
10,364 HERNANDEZ: TIME SERIES,PERIODOGRAMS,AND SIGNIFICANCE

f(x) D F(s) , (34)


20

0.95
15
f'(x) D i2nsF(s) , (35)
10

5
wheref(x) denotes
thefirstderivative
off(x), i = (- 1)•/2ands
is the Fourier plane abscissa.The latter is identified with fre-
Fisher 25
quencywhen x is in time units.
20
Equation(35) clearly shows that a differencing operation
15
0.95
depressesthe amplitude of the low-frequencycomponentsand
increasesthe amplitudeof the high-frequencycomponentsof the
5 original signal's spectrum. This operationconformsto the gen-
0
- I I
0.1
I k
I
0.2
I
0.3
I I
0.4
I eral definition of filtering as an enhancementof a region of a
spectrumrelative to the rest of the spectrum. The upper panel of
Frequency (Hertz)
Figure 13 shows the periodogramof a raw synthetic series.
Figure 12. Periodograms of the dataof Figure1, beforeandafter Although the raw synthetictime series consistsof six equal-
filtering with a squarefilter of 0.025 Hz bandwidthcenteredat
0.26 Hz. Note that the information available on the feature near
amplitudeoscillations(see FigureA2(a)), the periodogramshows
two of thesefrequenciesas missingbecausethey are outsidethe
0.26 Hz has not changedafter the arbitraryfiltering operation.
The 0.95 significancelevel for the upper periodogramfor the Nyquist limit determinedfrom the serial coherenceof the data
naturalfrequencies,presumingall datapointsare independent, is (seeFigureA2(b)), and the survivingfour frequenciesas having
also shown. unequal power. This latter effect is causedby the mismatch
betweenthe frequenciesof the oscillationsand the natural fre-
quenciesof the periodogram. The lower panel of Figure 13
tion streamreachesa desiredstatisticallysignificantlevel. This showsthe periodogramof the differencedsynthetictime series.
time is practicallyprovidedby the 'time constant'of the filter As expectedfrom the derivation of Equation(35) the low fre-
and/orthe coherentintegrationof the signal. However,whenone quenciesof the differenced periodogramhave been depressed
has a fixed total measurementinterval, the information is frozen while its high frequencieshave beenenhanced.At the sametime
withinthe dataandno furtherknowledgeis available(unlessnew the numberof degreesof freedomof the 'differenced'serieshas
information is provided,usuallyin theformof assumptions, etc.). increased,as can be seenby the increasedfrequencyrangeof the
Althoughde-trendingdataas part of statisticalanalysisis not lower panelperiodogramin Figure 13, and the appearanceof the
normallyconsidered a filteringoperation,its effectson theresults sixthfrequencywith exaggeratedpowerand enhancednoise.
canbe substantial andverysimilarin character to thoseoccurring The resultsshownin Figure 13 clearly showthe changesintro-
in filtering. A trendis normallydescribed asan amplitudevaria- duced by the difference procedureemployed in de-trending.
tion in the time serieswhichhasa periodicitymuchlongerthan Three of the known frequenciesknownto be statisticallysignifi-
the lengthof the availabledata,andde-trending consists of arbi- cant in the original data have disappeared,while two new fre-
trarilyremovingthissignal[Blackmanand Tukey,1959]. There- quenciesnot supportedby the serial coherenceof the original
fore de-trending becomes a specialized methodof filtering,with time serieshave appearedbecauseof the increaseddegreesof
all its associated pitfalls. freedomcreatedby the differencingoperation.Clearly, the lower
In principle,it is possibleto separatea 'trend'from the under- panelperiodogramin Figure 13 bearslittle similarityto the origi-
lying informationas long as the form of the trend is known. nal raw syntheticdata periodogramand cannot be describedas
There arisesthe immediatequestionas to how a statisticaltestof representingthe spectrumof the original time series. Thus, the
statistical results derived from the examination of a differenced
significance
performed
on de-trended
datarelatesto theoriginal
data. Since the power associatedwith the trend has been signalno longer refer to the original signaland thereforeare not
relevant to it.
removed,oneis no longermeasuring the significance
of a given
featurerelativeto the full spectrum.Thustheresultsno longer
referto theoriginaldataandarelikelyto beinvalidwhenapplied
to it. In addition,the de-trendingprocessmay substantially 35
changethe serial coherenceof the signal and the associated 30

numberof degreesof freedom. 25


20
The similaritybetweenfilteringandde-trendingcanbe easily 15
shownby subjectinga synthetictime seriesto a de-trending !o

operation.For thisexamplewe will employthe.equal-amplitude 5

six-frequencytime seriesusedin theAppendix.The de-trending Ip


operationusedhereis to 'difference'the time series[Priestley,
1981;Percivaland Walden,1993]. Differencingamounts to the
calculation of the first difference of the data, which in effect
obtains its first derivative.
From Fouriertransformtheory [Bracewell,1965] it is known
that the derivativeof a signalin the real planetranslates
in the 0 O. 1 0.2 0.3 0.4

transtbrmplaneinto the productof the originalsignaltransform Frequency(Hertz)


timesthe imaginaryfrequencyabscissa.Usingthe symbolD to Figure 13. Periodograms beforeandafter de-trendinga synthetic
indicatethe Fouriertransformoperationin whicha real plane time seriesby differencing. The obviouschangein the spectral
functionf(x) 'transforms into' F(s) in theFourierplane,we now characteranddegreesof freedomis noticeable.The 0.95 signifi-
have [Bracewell, 1965]: cance level is shown.
HERNANDEZ: TIME SERIES,PERIODOGRAMS, AND SIGNIFICANCE 10,365

The samedifferenceoperationcanbe appliedto the MF radar numberK of (statisticallysignificant)periodiccomponentswith


data shownin Figure1. The resultsof the analysisare givenin which to representthe time series.
Figure 14. The upperpanelis the originaldata periodogram(see For illustration, these mathematical methods have been
Figure7), while the lower panel showsthe periodogramof the applied to a real-world geophysicaltime series, but with the
differencedtime series. The differencingprocesshas eliminated implicit presumptionthat all the individual measurementsin the
the significantfeaturesexistingin the originaltime series. These time series are independent. In this process a discrepancy
two examplesshownin Figures13 and 14 clearlyshowthe perils between the Fisher and the Schuster-Walkersignificancetests
inherentin the manipulationof datawhen attemptingto de-trend and the conventionalLomb-Scargle[Press et al., 1986] signifi-
a time series. Although differencingis only one of the many cancetesthasbeenfound and resolved. It is reassuringthat these
methodsusedin de-trendingdata,the othermethodslead to ques- three tests, based on nearly the same assumptions,now give
tionableresultssimilar to thoseobtainedin the examplesgiven essentiallythe samemeasureof statisticalsignificance.However,
here. The de-trending operation is basically a specialized the resolutionof this discrepancyrequiresusingthe statisticgiven
filteringoperationwhich will likely lead to incorrectresultswhen in Equation(28c) for the Lomb-Scargletest,ratherthanthe Press
appliedto the originaldata. et al. [1986] tbrmulation for this statistic.
Both filtering and de-trendingare simple data manipulations, The presumptionthat individual measurements in a time series
in which power is arbitrarily removedfrom, addedto, or shifted are indeed independentdeservesto be closely investigatedsince
within the spectrumof the original information and where no the statistical significancetests are only applicable to those
accountingis madefor this change. Not surprisingly,the results periodogramfrequencieswhich are truly independent.Thus one
after any of thesemanipulationscan providealmostany desired must ascertainthe actual number of independentsamples,or
answer,which cannotbe supportedwith a critical statisticaltest numberof degreesof freedomv, presentin the time series. The
performedon the originaldata. Clearly,any manipulationof the number of degrees of freedom is determinedfrom the serial
originaldata which arbitrarilychangesthe characterand magni- coherence,or 'memory', of individualdatapointsaboutprevious
tude of the sample variance will lead to unpredictableresults events in the time series. This information is available from the
which cannot be statisticallysupported.Therefore, in general, autocorrelation function of the time series at the time its first zero
the statistical results obtained from a data set which has been fil- crossingoccurs,which is called the serial coherencetime of the
teredand/orde-trendedshouldbe suspected
of beingunreliable. process,and definesthe time betweenindependentobservations.
The numberof degreesof freedomis simply given by the ratio of
the time length of the seriesdivided by the (determined)serial
8. Summary and Discussion coherencetime. The fundamental'natural' frequencyis defined
In this study we have investigatedin detail the requirements as that frequencycorresponding to the periodassociatedwith the
for obtaining statistically significant information on coherent lengthof the time series,and the harmonicsare evenlyspacedup
oscillations present in a time series by using periodogram to the Nyquist limit, as given by the v/2 possiblefrequencies.
methods.To accomplishthis, a backgroundstudyof the require- Inherentto this discussionof naturalor independentfrequen-
mentsand mathematicaltechniquesutilized to obtain a periodo- cies is the appearanceof the conceptof a critical samplingrate in
gram from a time serieshas been investigated,coupledwith a the collectionof data, definedby the serialcoherencetime. Data
studyof the statisticaltestsemployedto extractstatisticallysigni- taken at samplingrates faster than this critical samplingrate are
ficant intbrmation from the periodograms.The basic statistical not independentand carry no new intbrmationwith them. The
tests are applicableonly for the largest periodogramordinate, parallel interpretationof a critical samplingrate is that a time
while the Whittle [1952] suggestion(see Equation(25) and our seriesis intrinsicallyband limited as given by the Nyquist fre-
modifiedresultof Equation(26) allow testingof the next largest quency associatedwith the serial coherencetime. Finding this
ordinatesandthereforemake it possibleto estimatethe maximum critical band-limitingfrequencydefinesa central and necessary
statisticalrequirementfor the efficient designof an experiment.
Obviously, the band-limiting frequency defines the shortest
periodicity available tbr investigationin the time series. The
above argumentsare applicablewhen the numberof degreesof
freedom is smaller than the number of measurements in the time
O.95
series;that is, the informationis overdeterminedby the sampling
processrepresentedby the time series. As an illustrationof these
deductions,the experimentfrom which our time seriesexample
was extracted would have been better served if the data rate had
beenhalvedandthe gapsfilled instead.
20
Once the appropriatenatural frequenciesof a time seriesare
0.95
determined (based on the serial coherence time of the data) and
the periodogramis obtained,the statisticalsignificancetestscan
then be applied in order to find their statisticalvalidity. The
Schuster-Walker and the (corrected) Lomb-Scargle tests are
0 0.1 0.2 0.3 0.4
asymptotic tests more applicable to large samples, while the
Frequency (Hertz) Fishertest is usefulfor any samplesize and is the preferredsta-
Figure 14. Periodogramsbefore and after de-trendingthe time tistical test to use. Since one doesnot know a priori what fre-
seriesof Fig. 1 by differencing. Again, the obviouschangein the quenciesmay be presentin the time series,it is possiblethat they
spectralcharacteris noticeable.In the upperpanelthe fundamen- may not be the samefrequenciesas the naturalfrequenciesused
tal frequencyhasbeenredefinedby croppingthe data length(see in the periodogram. Redefinition of the fundamentalfrequency,
Figure7). by croppingof the length of the time seriesso that one (or .more)
10,366 HERNANDEZ: TIME SERIES,PERIODOGRAMS, AND SIGNIFICANCE

of the naturalfrequenciescoincidewith the frequencies presentin for theirsignificance


are beinginvestigated,
andthe resultsfrom
the time series,presentsa solutiontbr this frequencymismatch. thisstudywill be reportedlater.
The price paid for this solutionis the loss of someinformation
tbrever. Obviously,this croppingcan only be doneonce,lest the Appendix
requiredindependenceis lost. Although this croppingof the
lengthof the time seriesto matchfrequenciesappearsto be arbi- In this appendixwe use syntheticdata in order to illustrate
trary,it is no morearbitrarythanthe selectionof the lengthin the further the properties and behavior of periodogramswhen
originaltime seriesbeforeit was to be investigatedfor statistical employedin the searchfor statisticallysignificantcoherentoscil-
significance. lationsin a time series. Examininga synthetictime series,whose
The testsusedto searchtbr statisticalsignificancehavea clear propertiesare known in advance,makes it possibleto show
and strongmessage:the significanceof any featureof the power clearly someof the resultsderivedin detail in the precedingmain
text.
spectrum derivedfrom a time seriesis relativeto the totalpower
of the series. Any manipulationthat changes,shifts,removes,or The specificsynthetictime seriesto be employedin the fol-
increases thepowerin oneregionof a powerspectrum, relativeto lowing text consistsof six equalamplitudefrequenciesrandomly
anotherregion, will alter the significanceof the resultsobtained. distributedacrossthe spectrum,with the arbitrary addition of
Gaussian noise of known variance. The resultant time series
Basically,suchmanipulationarbitrarilychangesthe characterand
amplitudeof the samplevarianceand,thus,the results.There are looks,to the eye, very similarto the realexperimental
datawhich
are found in the aeronomical studies with which this author is
three mainstreamproceduresthat perform such an operation:
filtering, tapering (which includes pre-whitening) and de- associated.Further, the amplitudesat thesefrequencieswill be
trending. As shownin the text, all of theseprocedures preferen- purposelychangedto illustratethe sometimesprofounddiffer-
tially and arbitrarilychangethe power in a portionof a power encesthat these variationsin amplitudecan have in the final
results.
spectrum,and yet give no accountfor this power manipulation.
Thesemanipulationsusuallyalso changethe numberof degrees The startingspecificsyntheticseriesis 256 secondslong with
of freedomfrom thoseavailablein the originalsample. Carrying a samplingrate of one per second,andconsistsof six oscillations
out statisticalanalysison a periodogramfrom data which have of 80-, 25-, 9-, 4-, 3.2-, and 2.2-s periodicity. As previously
been tamperedwith in this fashionwill give almostany desired stated,all these periodicitiesbegin with equal amplitude,arbi-
answer. In particular,if a featureis not foundto be statistically trarily set equal to 12 units. Finally, Gaussianrandomnoiseof
zero mean and 15 unit standard deviation noise has been added to
significantin the originalpowerspectrum,it shouldnot acquire
any furthersignificanceby frequencydependentmanipulationof the seriesmadeup from thesesix frequencies.The top panelof
the powerspectrum.To assignto the originaldatathe resultsof FigureA1 showsthe original equal-amplitudeseries,while the
significancetestspertbrmedon data whosepower spectrumhas lower two panelsillustratean arbitrarilyincreasingamplitudeof
been arbitrarily manipulatedis simply invalid. These signifi- oscillationof the 80-s periodicity. The applicableamplitudein
cancetestsare no longerbeing madeon the originaldata and are the middle panel is given as a(80): 18, indicatingthat the 80-s
no longer relevantto them. periodicityoscillationamplitudeis 18 units,increasingto 25 units
Although the use of periodogramtechniquessacrificesthe in the lower panel.
Referring to Figure A1, the readercan see the obviouspres-
phaseinformationin order to determinestatisticalsignificance,
this information is still available for those frequencieswhose enceof a lower-frequencyoscillation,as well as an increasein its
power has testedto be statisticallysignificant.If anything,the amplitude. FigureA2 shows the periodogramsfor FigureA1.
phaseinformationis made more significantby its being associ-
atedwith a frequencywhosepowerhasbeendetermined(statisti-
50
cally) to exist.
In conclusion,the presentstudyhasrigorouslyinvestigated the
0
processof determiningthe presenceof meaningful,statistically
significant, coherent oscillationsin time series when using -50

periodogramtechniques. As part of this process,it has been


5O
shown there exist both fundamental limitations on the extent of
availableinformationpresentin a given time seriesand alsopit-
Signal 0
falls to be avoided. In particular,it hasbeenshownthat someof
the commonly used data manipulation methods can lead to -50

incorrectand misleadinginterpretations, sincethey can alter the


informationcontainedin the original data to suchan extentthat
50
- ,ill |a(80):25
the final resultsobtainedare no longer relevantto the original 0
data. The methodologydescribedhere hasbeenemployedon a luqv v
real sampleof upperatmospherewind measurements in orderto -50
i i
illustrate the extent of extractablemeaningful information, as 50 1•) 150 2• 250
well as to suggesta direction for the statisticalanalysisof the
Time (seconds)
large existing collectionof thesemeasurements.This approach
would alsobe applicableto many otheravailabledatacollections
Figure A1. Syntheticseriesgenerated with six frequenciesand
the arbitraryadditionof Gaussiannoiseof knownvariance.The
such as incoherent radar information, satellite observations of
upperpanelhasall six frequencies of equalamplitude(12 units),
atmosphericproperties, meteorologicalrecords, solar plasma while the middleandlowerpanelshaveincreasing amplitudefor
measurements, etc.
the oscillationwith 80-s periodicity,as indicated. The added
Besidesthe periodogram,there exist other methodsto extract Gaussian noise has a zero mean and a standard deviation of 15
information from a time series. These methods and the criteria units.
HERNANDEZ: TIME SERIES, PERIODOGRAMS, AND SIGNIFICANCE 10,367

41}
• a)
3O 80

1o 60

Ip
4{)
•- b) 40

30•
20

2O

O.95
10

0 (}. 1 0.2 0.3 0.4

Frequency (Hertz)

20•1
Figure A4. Periodogramof the time seriesof FigureA3. Note
the increasein the numberof degreesof freedombroughtby the
increasein amplitudeof the highestfrequencyoscillation.The
Fisher test is also shown.
40
•[
3½)

0.95

, , , , , , , serial coherenceof the seriesis taken into account. Statistically,


0 0.1 0.2 0.3 0.4
we can say nothingaboutthe 2.2-s periodicity;it is not indepen-
Frequency (Hertz)
dent. Note that this periodogramhas been used in Figure 13.
Figure A2. Periodograms for thetime seriesgivenin FigureA1. FigureA2c) and A2d) correspondto the middle and lower panels
Panelsa) and b) referto the equal-amplitude seriesof the upper of Figure A1, where the amplitude of the 80-s oscillation has
panel of Figure A1. Panel a) presumesall the 'measurements'in
been arbitrarily increased,and we note the effects of a further
the seriesare independent.b) usesthe number(162) of degrees
of treedom derived from the serial coherence of the data. Panels decreasein the numberof degreesof freedomavailable,down to
c) andd) applyto themiddleandlowerpanels
of Figure
A'I, 106 and 76 respectively. Besides the loss in frequency range
wherethe amplitudeof the lowestfrequencyis arbitrarilyincreas- available for examinationif a low-frequencyoscillationampli-
ing. Note the decreasein the number(106 and76) of the degrees tude increases,the relative power of the other survivingfrequen-
of freedom available and consequentdecreasein the spectral cies becomesa smaller portion of the total power, leading to a
region available for examination. The Fisher 0.95 confidence lossin significanceof thesenow weakerpowerfeatures.
significancetestsare shown. Becauseof the strong effect that low frequenciesin a time
serieshave in the resultantperiodogram,i.e., increasingthe serial
coherencetime and thusloweringthe numberof degreesof free-
The Fisherstatisticaltest for 0.95 significance
is shownfor the dom available, the questionarises as to how much power the
applicablenumberof degreesof freedom. FigureA2a) illustrates high-frequencyspectrumoscillationsshouldhave beforethey can
the periodogramwhenthe assumption is madethat all the 'meas- become amenablefor examination. FigureA3 shows the six-
urements'making up the seriesare independent.The six fre- frequencyspectrumwhere highest frequency (2.2-s oscillation
quenciesmakingup the time seriesare clearlyvisible. However, period) amplitude is arbitrarily increaseduntil the number of
theircalculatedpowerappearto be uneven.This is no surprise, degreesof freedom of the time series is large enough to allow
since the periodogram-calculated frequenciesdo not exactly examination of the region of the spectrumwhere it resides.
matchthoseof the six oscillationspresent.FigureA2b) shows Figure A4 gives the resultantperiodogramshowingthe presence
the limited spectralrangewhich can be examinedbecauseof the of this sixth oscillation,and clearly demonstratesthat, in the pres-
limited number(162) of availabledegreesof freedomwhen the ence of lower frequencies,higher-frequencyoscillationsneed to
have a higher amplitude (and power) in order to overcomethe
serial coherence of any of the lower frequency components
presentin a given spectrum. In general,one can concludethat
the higherfrequenciespossiblypresentin a time seriesmusthave
a(2.2): 2:
50

25

Signal 0

-25

-50
ß ß ß ß ß

-75
Signal 0 ß ß ß ß ß ß

ß ß ß ß ß
I
ß ß ß ß ß ß ß ß ß ß
0 50 100 150 200 250
ß ß ß ß ß ß ß ß ß ß

Time (seconds)
Figure A3. Six-frequencysyntheticseries,where the first five V V V V V
oscillations
haveequal(12 units)amplitudewhile the 2.2-s oscil-
0 25 5'0 75 100 125
lationnow hashigheramplitude(28 units),asdenoted.Gaussian
noise with a zero mean and a standard deviation of 15 units' has Time (seconds)

beenadded.Comparewith theupperpanelof FigureA 1. Figure A5. Synthetictime serieswith equallyspaceddatagaps.


10,368 HERNANDEZ: TIME SERIES,PERIODOGRAMS, AND SIGNIFICANCE

Acknowledgments.The authorwouldlike to expresshisappreciation


to G. J. Fraserfor providingthe measurements
usedin the data sample.
Also, the author would like to thank K. C. Clark, G. J. Fraser, and R. W.
Smithfor stimulatingdiscussions.This investigationwas supportedin
(}_95
partby grantsATM-9610200andOPP-9615157from the NationalSci-
ence Foundation.
Janet G. Luhmann thanks David Altadill and another referee for their
I I I
0 O.l 0.2 ½).3 0.4 assistance
in evaluatingthispaper.

Frequency (Hertz)
Figure A6. Periodogramof the time seriesof Figure A5. Note
the modulation effect of the data gaps. The Fisher significance
test is also shown. References

Blackman, R. B. and J. W. Tukey, The Measurementqf Power Spectra


From the Point of View of Communications
Engineering,Dover Publi-
cations, New York, 1959.
much larger amplitudethan the existing lower frequenciesin Bracewell,R., The Fourier Tran•brm and Its Applications,McGraw-Hill,
order to be amenablefor statisticalsignificanceexamination. It New York, 1965.
should be noted that, incidental to this requirementof larger Fisher, R. A., Tests of significancein harmonicanalysis,Proc. R. Soc.
London, Ser. A, 125, 54-59, 1929.
power for a given high frequencyto be amenablefor examina-
Grenander,U., and M. Rosenblatt,StatisticalAnalysisof StationaryTime
tion, this higher power would tend to diminishthe significanceof Series, Chelsea, New York, 1957.
the rest of the lower-frequencyspectrum.This can be seenby Hamilton, K., and R. R. Garcia, Theory and observationsof the short-
comparingFigure A2(a) with Figure A4. periodnormalmode oscillationsof the atmosphere,J. Geophys.Res.,
In the Section 2 we discussed the need that the data from 91, 11,867-11,875, 1986.
Harrison,D. E., and N. K. Larkin, Darwin sealevel pressure,1876-1996:
which a periodogramwould be derivedshouldideally be equally
Evidence for climate change?,Geophys.Res. Lett., 24, 1779-1782,
spaced. However,with the Lomb [1976] approachit is possible 1997.
to handle data gaps and unequally-spaced data. In both these Hoel, P. G., Introduction to Mathematical Statistics,John Wiley, New
cases of irregularity, the gaps and the unequally-spaceddata York, 1954.
should have nearly random separation, lest other problems Kendall,M. G., TheAdvancedTheoryof Statistics,vol. 2, CharlesGriffin,
London, 1948.
appear. The randomnessof the separationcan be easily tested Leith, C. E., The standarderror of time-averageestimatesof climatic
using standard techniques. Here we illustrate the difficulties means,J. Appl.Meteorol.,12, 1066-1069,1973.
encounteredwhen data gaps in a time series are themselves Little, R. J. A., and D. A. Rubin, StatisticalAnalysiswith Missing Data,
periodic,as when measurements are only possiblefor part of the JohnWiley, Inc., New York, 1987.
day or when missing data points have a recurrentpattern. For Lomb, N. R., Least-squares frequencyanalysisof unequallyspaceddata,
Astrophys.SpaceSci, 39, 447-462, 1976.
consistency with previous material in the text, we will use Percival,D. B., and A. T. Walden, SpectralAnalysis.[brPhysicalApplica-
secondsin the abscissaas a proxy for hours. For the presentsyn- tions,CambridgeUniv. Press,Cambridge,1993.
thetic investigation we will presume that the processto be Press,W. H., B. P. Flannery, S. A. Teukolsky,and W. T. Vetterling,
observed consists of a 12-s oscillation, which, because of limita- NumericalRecipes,CambridgeUniv. Press,London, 1986.
Priestley,M. B., SpectralAnalysisand TimeSeries,AcademicPress,Lon-
tionsin the measuringtechnique,can only be observedduringthe don, 1981.
night. The results of these hypotheticalobservationsare illus- Scargle,J. D., Studiesin astronomical time seriesanalysis.II. Statistical
trated in Figure A5. The periodogramanalysisof thesedata is aspectsof spectralanalysisof unevenlyspaceddata,Astrophys. J., 263,
given in Figure A6, where we have presumedthat the individual 835-853, 1982.
Schuster,A., On the investigationof hiddenperiodicitieswith application
'measurements' are independent and the series is quasi-
to a supposed 26 day periodof meteorological phenomena, Terr. Mag.
stationary.Althoughwe know that the only oscillationof interest Atmos. Elect., 3, 13-41, 1898.
presentin the measurements is a 12-s periodicity,the appearance Walker, G. T., Correlation in seasonalvariations of weather, III. On the
of other significantfrequenciesis apparent.The resultsshownin criteriafor the reality of relationshipsor periodicities,Mem. Indian.
FigureA6 are not surprising,sincethey simply show the ampli- Meteorol.Dep., 21(9), 13-15, 1914.
Whittle, P., The simultaneous estimation of a time series harmonic com-
tudemodulationof the 12-soscillationby the recurrent24-s gaps.
ponentsandcovariance structure,Trab.Estadistica,3, 43-57, 1952.
From Fourier transformtheory one shouldexpectthat frequencies
atf•2 + k f24, wherek is an integer,shouldappearas canbe seen
in Figure A6. Shouldthere exist othertrue frequenciesin the ori-
ginaldataseries,theirinterpretation wouldbe cross-contaminated G. Hernandez,GraduateProgramin Geophysics, Box 351650,Univer-
by the sidebandsof the other frequencies.This exampleis sim- sity of Washington,Seattle,WA 98!95-1650. (e-mail: hernandez@
u.washington.edu)
ply an ill posedproblem for statisticalsignificanceanalysisby
periodogramtechniques,but it is illustrativeof the dangerslurk- (Received September 4, 1998;revisedJanuary 5, 1999;
ing when the methodologyis improperlyapplied. accepted January5, 1999.)

You might also like