You are on page 1of 5

#N/A

Missing Values

Thisissueisthefirstinaseriesofarticlesthatexplorethedatapreparationaspectoftimeseries analysis.Datapreparationisoftenoverlookedbyanalysts,butwebelieveitisavitalphasethatwieldsa vastinfluenceontheoverallanalysisandmodelingprocess.Thevastmajorityoftimeseriesand econometrictheoriesassumeinputtimeseriestobestationaryandhomogenous,withequallyspaced observationsandvaluesthatarepresentandreal.Inpractice,weoftenhandlesampleswithmissing values,unequallyspacedobservationspossibleoutliers,mean/variancedependency,restrictedvalues rangesandotherphenomena.Theaimofthisseriesofarticlesistoaddresseachoftheseproblemsand introducepracticalmethodstoovercomethem. Inthisissue,westartwiththesamplingassumptionsofthetimeseries:equalspacingand completeness.Thenweconsideratimeserieswithmissingvaluesanddiscusshowtorepresentthemin Excel,withtheaidofNumXLprocessing.Finally,welookatunequallyspacedtimeseries,howthey comeintoexistence,howtheyarerelatedtothemissingvaluesscenario,andwhattodowiththem.

Sampling
Thecommon(perfect)situationforatimeseriessampleisonethathasequallyspacedobservationsand presentvaluesforallpoints.Thisariseseitherbecauseobservationsaremadedeliberatelyateven intervals(continuousprocess),orbecausetheprocessonlygeneratesoutputsatsuchintervalintime (discreteprocess). Furthermore,thetimeunitforasamplingperiod(i.e.step)betweentwoconsecutiveobservationscan beeitherabsolute(e.g.daily,weekly,monthly,orannual),orbasedonaholidaycalendar(i.e.adjusted forweekendandholidays).Forexample,adailyfinancialtimeseriesofIBMstockclosingpricesisbased ontheNYSEholidayscalendar,soeachobservationistakenonaNYSEtradingday(open/close). Withrespecttotimeseriesmodelingandforecasting,itisnotimportantwhetherweuseabsolutetime orifweadjustforweekendsandholidays.Whatisimportantishowweinterprettheoutofsample dates,astheytooarebasedonthesamesamplingmethod. Next,letsexaminesomecaseswheretheinputtimeseriesisnotsoperfect.

Tips&HintsMissingValues

SpiderFinancialCorp,2012

Issue 1: Missing values


Insomesituations,oneormoreobservationdatesyieldinvalidormissing values.Thesevaluesaredesignatedasnotavalues,orNaNforshort.In Excel,NaNisidentifiedbythespecial#N/Arepresentation,andfewbuilt infunctionscanbeusedtodetect(e.g.NA(),ISNA(.),IFERROR(.),etc.)or ignorethem(e.g.MIN(.),MAX(.)),andotherfunctionsarenotsupportive. Intimeseriesanalysis,weoftenencountermissingvaluesphenomena, eitherintheoriginalrawtimeseriesorasaresultofatimeseriesoperator (e.g.lag,differencing,etc.). Q:Whatcanwedowithatimeserieswithmissingvalues? NumXLhastwosimplerules: 1. Themissingvaluesatthebeginningortheendofthetimeseries aresimplyignored. NumXLwilltruncatetheinputtimeseriestostartfromthe1stnonmissingvalueandendwith thelastnonmissingvalue. 2. Theintermediatemissingvaluesareconsideredseriousflawsintheinputtimeseries,and NumXLcantprocessthem. Theserulesbegthequestion:howdowehandlemissingintermediatevalues?

Manytechniqueshavebeenproposedtohandletimeserieswithmissingdata,butwecansummarize theseproposalswithtwoprinciples:ignoreandinterpolate.

Tips&HintsMissingValues

SpiderFinancialCorp,2012

IGNORE Theignoresolutionsimplydropsthemissingvaluefromthetimeseries.YoucanusetheNumXLRMNA (.)functionforthispurpose.However,youshouldapproachthissolutioncautiouslyasitaltersthe samplingofthetimeseriesitself. INTERPOLATE Theinterpolateapproachreplacesthemissingvalueswithinterpolatedvalues.Thereareseveral interpolationalgorithms:linear,polynomial,smoothing,spline,filtering,etc. Interpolationdoesnotchangethefrequencyofthesampling,butitmayaffecttheperceiveddynamics oftheunderlyingprocessifitisusedforseveralpointsinthetimeseries. NumXLcomeswithaninterpolationfunctionINTERPOLATEwhichsupportsfour(4)different interpolationalgorithms: Forward & Backward Flat Interpolation

Linear & Cubic Spline Interpolation

NOTE:TheInterpolatefunctiondiscardsallpointswithmissingvalues,sowecanusethefunction directlyontherawdatasetwithoutanyintermediatepreparation.

Tips&HintsMissingValues

SpiderFinancialCorp,2012

Issue 2: Unevenlyspaced time series


Unevenlyspacedtimeseriesarecommoninmanyreallifeapplicationswhenmeasurementsare constrainedbypracticalconditions.Theirregularityofobservationscanhaveseveralfundamental reasons.First,anyeventdrivencollectionprocess(inwhichobservationarecollectedwhensomeevent occurs)isinherentlyirregular.Second,insuchapplicationsassensornetworksoranydistributed monitoringinfrastructure,datacollectionisdistributedandcollectionagentscanteasilysynchronize withoneanother.Inaddition,thesamplingintervalsandpoliciesmaybedifferent.Finally, measurementscannotbemaderegularlyormayhavetobeinterruptedduetosomeevents(either foreseenornot).

Note:Unliketheequalspacedtimeseriescase,intermediateobservationswithmissingvaluescanbe safelydroppedfromtheoriginalserieswithoutanylossofinformation,and,obviously,theresultant seriesisunevenlyspacedaswell. Manytechniqueshavebeenproposedtohandletimeserieswithmissingdata,whichinthelimitcanbe viewedasirregularlysampled. Indataanalysispractice,irregularityisarecognizeddatacharacteristic,andpractitionersdealwithit heuristically.

Solution 1: Convert to equallyspaced time series


1. IGNORE: IGNOREtheirregularityinthetimesandtreatthedataasifitwereregular. 2. RESAMPLE:RESAMPLEusingalowersamplingrate.Thereductionsimplifiestheproblemto onethathasalreadybeenthoroughlyanalyzedandforwhichmanyapproachesareavailable. Note:Forapricetimeseries,downsamplingrequirestakingthelastobservationinthenew sampleperiod.Forthisstrategyslogreturn,theresampledreturnisthecumulativereturnsof allperiodsintheoriginalsampleperiods. Tips&HintsMissingValues 4 SpiderFinancialCorp,2012

3. INTERPOLATE:Interpolatetheintermediatemissingvaluesandconverttheseriestoonewith equallyspacedsamplingtimes.Whilethisisareasonableheuristicfordealingwithmissing values,theinterpolationprocesstypicallyresultsinasignicantbias(e.g.smoothingofthedata) thatchangesthedynamicsoftheprocess,thusthesemodelscannotbeappliedifthedatais trulyunequallyspaced. 4. Kernel Smoothing 5. Brownian Bridging:Anumberofauthorshavesuggestedusingcontinuoustimediusion processestofindmissingvalues.Inprinciple,tointerpolateamissingvalue,weassumea Brownianmotionbetweenthevaluesimmediatelypriortoandafterthenonmissing observations. Note:Asofthedateofthisissue,NumXLdoesnotsupporttheBrownianbridginginterpolationmethod.

Solution II Use unequallyspaced time series Models


Thesemodelsareslightlymorecomplexthantheirequallyspacedcounterpartmodels,andmanycanbe viewedasanextensionoftheequallyspacedtimeseriesmodels. Supposing Y (t ) isatimeserieswithirregularsampling,wecandecomposeitinto: Where

Y (t ) a (t ) X (t )

a (t ) isaslowlychangingdeterministicfunction(trendcomponent) X (t ) isarandomnoisecomponent

Ingeneral,onecanonlyobserve Y (t ) ,ourfirstgoalistoestimatethedeterministiccomponentand forprocess X (t ) . Note:Asofthedateofthisissue,NumXLdoesnotsupportunevenlyspacedtimeseriesmodels.

extracttherandomnoise X (t ) Y (t ) a (t ) ;oursecondgoalistofindasatisfactoryprobabilisticmodel

Tips&HintsMissingValues

SpiderFinancialCorp,2012

You might also like