Professional Documents
Culture Documents
Missing Values
Thisissueisthefirstinaseriesofarticlesthatexplorethedatapreparationaspectoftimeseries analysis.Datapreparationisoftenoverlookedbyanalysts,butwebelieveitisavitalphasethatwieldsa vastinfluenceontheoverallanalysisandmodelingprocess.Thevastmajorityoftimeseriesand econometrictheoriesassumeinputtimeseriestobestationaryandhomogenous,withequallyspaced observationsandvaluesthatarepresentandreal.Inpractice,weoftenhandlesampleswithmissing values,unequallyspacedobservationspossibleoutliers,mean/variancedependency,restrictedvalues rangesandotherphenomena.Theaimofthisseriesofarticlesistoaddresseachoftheseproblemsand introducepracticalmethodstoovercomethem. Inthisissue,westartwiththesamplingassumptionsofthetimeseries:equalspacingand completeness.Thenweconsideratimeserieswithmissingvaluesanddiscusshowtorepresentthemin Excel,withtheaidofNumXLprocessing.Finally,welookatunequallyspacedtimeseries,howthey comeintoexistence,howtheyarerelatedtothemissingvaluesscenario,andwhattodowiththem.
Sampling
Thecommon(perfect)situationforatimeseriessampleisonethathasequallyspacedobservationsand presentvaluesforallpoints.Thisariseseitherbecauseobservationsaremadedeliberatelyateven intervals(continuousprocess),orbecausetheprocessonlygeneratesoutputsatsuchintervalintime (discreteprocess). Furthermore,thetimeunitforasamplingperiod(i.e.step)betweentwoconsecutiveobservationscan beeitherabsolute(e.g.daily,weekly,monthly,orannual),orbasedonaholidaycalendar(i.e.adjusted forweekendandholidays).Forexample,adailyfinancialtimeseriesofIBMstockclosingpricesisbased ontheNYSEholidayscalendar,soeachobservationistakenonaNYSEtradingday(open/close). Withrespecttotimeseriesmodelingandforecasting,itisnotimportantwhetherweuseabsolutetime orifweadjustforweekendsandholidays.Whatisimportantishowweinterprettheoutofsample dates,astheytooarebasedonthesamesamplingmethod. Next,letsexaminesomecaseswheretheinputtimeseriesisnotsoperfect.
Tips&HintsMissingValues
SpiderFinancialCorp,2012
Manytechniqueshavebeenproposedtohandletimeserieswithmissingdata,butwecansummarize theseproposalswithtwoprinciples:ignoreandinterpolate.
Tips&HintsMissingValues
SpiderFinancialCorp,2012
IGNORE Theignoresolutionsimplydropsthemissingvaluefromthetimeseries.YoucanusetheNumXLRMNA (.)functionforthispurpose.However,youshouldapproachthissolutioncautiouslyasitaltersthe samplingofthetimeseriesitself. INTERPOLATE Theinterpolateapproachreplacesthemissingvalueswithinterpolatedvalues.Thereareseveral interpolationalgorithms:linear,polynomial,smoothing,spline,filtering,etc. Interpolationdoesnotchangethefrequencyofthesampling,butitmayaffecttheperceiveddynamics oftheunderlyingprocessifitisusedforseveralpointsinthetimeseries. NumXLcomeswithaninterpolationfunctionINTERPOLATEwhichsupportsfour(4)different interpolationalgorithms: Forward & Backward Flat Interpolation
NOTE:TheInterpolatefunctiondiscardsallpointswithmissingvalues,sowecanusethefunction directlyontherawdatasetwithoutanyintermediatepreparation.
Tips&HintsMissingValues
SpiderFinancialCorp,2012
3. INTERPOLATE:Interpolatetheintermediatemissingvaluesandconverttheseriestoonewith equallyspacedsamplingtimes.Whilethisisareasonableheuristicfordealingwithmissing values,theinterpolationprocesstypicallyresultsinasignicantbias(e.g.smoothingofthedata) thatchangesthedynamicsoftheprocess,thusthesemodelscannotbeappliedifthedatais trulyunequallyspaced. 4. Kernel Smoothing 5. Brownian Bridging:Anumberofauthorshavesuggestedusingcontinuoustimediusion processestofindmissingvalues.Inprinciple,tointerpolateamissingvalue,weassumea Brownianmotionbetweenthevaluesimmediatelypriortoandafterthenonmissing observations. Note:Asofthedateofthisissue,NumXLdoesnotsupporttheBrownianbridginginterpolationmethod.
Y (t ) a (t ) X (t )
a (t ) isaslowlychangingdeterministicfunction(trendcomponent) X (t ) isarandomnoisecomponent
extracttherandomnoise X (t ) Y (t ) a (t ) ;oursecondgoalistofindasatisfactoryprobabilisticmodel
Tips&HintsMissingValues
SpiderFinancialCorp,2012