Time Series Slides

Time Series
Karanveer Mohan
Keegan Go
EE103
Stanford University
November 12, 2015
Stephen Boyd
Outline
Introduction
Linear operations
Least-squares
Prediction
Introduction
Time series data
represent time series x1 , . . . , xT as T -vector x
xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T

examples:
average temperature at some location on day t

closing price of some stock on (trading) day t
hourly number of users on a website
altitude of an airplane every 10 seconds
enrollment in a class every quarter
vector time series: xt is an n-vector; can represent as T n matrix
Introduction
Types of time series
time series can be

I
smoothly varying or more wiggly and random
roughly periodic (e.g., hourly temperature)
growing or shrinking (or both)
random but roughly continuous

(these are vague labels)
I
Introduction
Melbourne temperature
I
daily measurements, for 10 years
you can see seasonal (yearly) periodicity

30
25
20
15
10
5
0
Introduction
500
1000
1500
2000
2500
3000
3500
4000
Melbourne temperature
zoomed to one year

25
20
15
10
0
50
Introduction
100
150
200
250
300
350
Apple stock price

I
log10 of Apple daily share price, over 30 years, 250 trading days/year
you can see (not steady) growth

2.5
1.5
0.5
0.5
1
0
Introduction
1000
2000
3000
4000
5000
6000
7000
8000
9000
Log price of Apple
zoomed to one year

0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
6000
Introduction
6050
6100
6150
6200
6250
Electricity usage in (one region of) Texas

I
total in 15 minute intervals, over 1 year
you can see variation over year

10
0
0
0.5
1.5
2.5
3.5
4
4
x 10
Introduction
Electricity usage in (one region of) Texas

I
zoomed to 1 month
you can see daily periodicity and weekend/weekday variation

9
1
500
Introduction
1000
1500
2000
2500
10
Outline
Introduction
Linear operations
Least-squares
Prediction
Linear operations
11
Down-sampling
k down-sampled time series selects every kth entry of x
can be written as y = Ax
for 2 down-sampling,
1 0
0 0
0 0
A= . .
.. ..
0 0
0 0
Linear operations
T even,
0
0
0
..
.
0
0
0
..
.
0
0
0
..
.
0
0
0
..
.
0 0 0 0
0 0 0 0
1
0
0
0
0
1
0
1
0
..
.
0
0
0
..
.
0
0
1
..
.
0
0
0
..
.
0
0
12
Down-sampling on Apple log price

4 down-sample
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.1
0.09
5100
Linear operations
5102
5104
5106
5108
5110
5112
5114
5116
5118
5120
13
Up-sampling
k (linear) up-sampling interpolates between entries of x
can be written as y = Ax
for 2 up-sampling
1
1/2
A=
Linear operations
1/2
1
1/2
1/2
1
..
.
1
1/2
1/2
1
14
Up-sampling on Apple log price

4 up-sample
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.1
0.09
5100
Linear operations
5102
5104
5106
5108
5110
5112
5114
5116
5118
5120
15
Smoothing
k-long moving average y of x is given by

yi =
1
(xi + xi+1 + + xi+k1 ),
k
can express as y = Ax,
1/3 1/3
1/3
A=
i = 1, . . . , T k + 1
e.g., for k = 3,
1/3
1/3 1/3
1/3 1/3 1/3
..
.
1/3
1/3
1/3
can also have trailing or centered smoothing
Linear operations
16
Melbourne daily temperature smoothed

I
centered smoothing with window size 41

30
25
20
15
10
5
0
Linear operations
500
1000
1500
2000
2500
3000
3500
4000
17
First-order differences
(first-order) difference between adjacent entries
discrete analog of derivative
express as y = Dx, D is the (T 1) T difference matrix
1 1
...
1 1 . . .
D=
.
.
..
..
. . . 1 1
kDxk2 is a measure of the wiggliness of x

kDxk2 = (x2 x1 )2 + + (xT xT 1 )2
Linear operations
18
Outline
Introduction
Linear operations
Least-squares
Prediction
Least-squares
19
De-meaning
de-meaning a time series means subtracting its mean:

x
= x avg(x)
rms(
x) = std(x)
this is the least-squares fit with a constant
Least-squares
20
Straight-line fit and de-trending
fit data (1, x1 ), . . . , (T, xT ) with affine model xt a + bt

(also called straight-line fit)
b is called the trend
a + bt is called the trend line
de-trending a time series means subtracting its straight-line fit
de-trended time series shows variations above and below the

straight-line fit
Least-squares
21
Straight-line fit on Apple log price
Trend
2.5
2
1.5
1
0.5
0
0.5
1
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
6000
7000
8000
9000
Residual
1
0.5
0.5
1
0
Least-squares
1000
2000
3000
4000
5000
22
Periodic time series
let P -vector z be one period of periodic time series

xper = (z, z, . . . , z)
(we assume T is a multiple of P )
express as xper = Az with
IP
A = ...
IP
Least-squares
23
Extracting a periodic component
given (non-periodic) time series x, choose z to minimize kx Azk2
gives best least-squares fit with periodic time series
simple solution: average periods of original:

z = (1/k)AT x,
k = T /P
e.g., to get z for January 9, average all xi s with date January 9
Least-squares
24
Periodic component of Melbourne temperature
30
25
20
15
10
5
0
Least-squares
500
1000
1500
2000
2500
3000
3500
4000
25
Extracting a periodic component with smoothing
can add smoothing to periodic fit by minimizing

kx Azk2 + kDzk2
> 0 is smoothing parameter
D is P P circular difference matrix
1 1
1 1
.. ..
D=
.
.
1
1
is chosen visually or by validation
Least-squares
26
Choosing smoothing via validation
split data into train and test sets, e.g., test set is last period (P
entries)
train model on train set, and test on the test set
choose to (approximately) minimize error on the test set
Least-squares
27
Validation of smoothing for Melbourne temperature

trained on first 8 years; tested on last two years
3
2.95
2.9
2.85
2.8
2.75
2.7
2.65
2.6
2
10
Least-squares
10
10
10
10
28
Periodic component of temperature with smoothing

I
zoomed on test set, using = 30

25
20
15
10
0
2900
Least-squares
3000
3100
3200
3300
3400
3500
3600
29
Outline
Introduction
Linear operations
Least-squares
Prediction
Prediction
30
Prediction
goal: predict or guess xt+K given x1 , . . . , xt
K = 1 is one-step-ahead prediction
prediction is often denoted x

t+K , or more explicitly x
(t+K|t)
(estimate of xt+K at time t)
x
t+K xt+K is prediction error
applications: predict
Prediction
asset price
product demand
electricity usage
economic activity
position of vehicle
31
Some simple predictors
constant: x
t+K = a
current value: x
t+K = xt
linear (affine) extrapolation from last two values:

x
t+K = xt + K(xt xt1 )
average to date: x
t+K = avg(x1:t )
(M + 1)-period rolling average: x

t+K = avg(x(tM ):t )
straight-line fit to date (i.e., based on x1:t )
Prediction
32
Auto-regressive predictor
auto-regressive predictor:
x
t+K = (xt , xt1 , . . . , x(tM ) )T + v
M is memory length
(M + 1)-vector gives predictor weights; v is offset
I
I
prediction x
t+K is affine function of past window xtM :t
(which of the simple predictors above have this form?)
Prediction
33
Least-squares fitting of auto-regressive models
choose coefficients , offset v via least-squares (regression)
regressors are (M + 1)-vectors

x1:(M +1) , . . . , x(N M ):N
outcomes are numbers

x
M +K+1 , . . . , x
N +K
can add regularization on
Prediction
34
Evaluating predictions with validation
I
I
for simple methods: evaluate RMS prediction error

for more sophisticated methods:
split data into a training set and a test set (usually sequential)
train prediction on training data
test on test data
Prediction
35
Example
predict Texas energy usage one step ahead (K = 1)
train on first 10 months, test on last 2
Prediction
36
Coefficients
I
I
using M = 100
0 is the coefficient for today
2
1.5
0.5
0.5
1
0
Prediction
10
20
30
40
50
60
70
80
90
100
37
Auto-regressive prediction results
1
3
3.1
3.2
3.3
3.4
3.5
4
x 10
Prediction
38

showing the residual
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
3
3.1
3.2
3.3
3.4
3.5
4
x 10
Prediction
39
predictor
average (constant)
current value
auto-regressive (M = 10)
Prediction
RMS error
1.20
0.119
0.073
0.051
40
Autoregressive model on residuals
fit a model to the time series, e.g., linear or periodic

subtract this model from the original signal to compute residuals
apply auto-regressive model to predict residuals
can add predicted residuals back to model to obtain predictions
Prediction
41
Example
I
I
Melbourne temperature data residuals

zoomed on 100 days in test set
8
8
3000
Prediction
3010
3020
3030
3040
3050
3060
3070
3080
3090
3100
42
Auto-regressive prediction of residuals
10
10
3000
Prediction
3010
3020
3030
3040
3050
3060
3070
3080
3090
3100
43
Prediction results for Melbourne temperature
tested on last two years

predictor
average
current value
periodic (no smoothing)
periodic (smoothing, = 30)
auto-regressive on residual (M = 20)
Prediction
RMS error
4.12
2.57
2.71
2.62
2.44
2.27
2.22
44

Time Series Slides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Slides

Uploaded by

Copyright:

Available Formats

Time Series

November 12, 2015

Time series data

represent time series x1 , . . . , xT as T -vector x

xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T

average temperature at some location on day t

vector time series: xt is an n-vector; can represent as T n matrix

Types of time series

time series can be

smoothly varying or more wiggly and random

roughly periodic (e.g., hourly temperature)

growing or shrinking (or both)

random but roughly continuous

daily measurements, for 10 years

you can see seasonal (yearly) periodicity

zoomed to one year

Apple stock price

you can see (not steady) growth

Log price of Apple

zoomed to one year

Electricity usage in (one region of) Texas

total in 15 minute intervals, over 1 year

you can see variation over year

Electricity usage in (one region of) Texas

you can see daily periodicity and weekend/weekday variation

k down-sampled time series selects every kth entry of x

Down-sampling on Apple log price

k (linear) up-sampling interpolates between entries of x

Up-sampling on Apple log price

k-long moving average y of x is given by

can express as y = Ax,

can also have trailing or centered smoothing

Melbourne daily temperature smoothed

centered smoothing with window size 41

(first-order) difference between adjacent entries

discrete analog of derivative

express as y = Dx, D is the (T 1) T difference matrix

kDxk2 is a measure of the wiggliness of x

de-meaning a time series means subtracting its mean:

this is the least-squares fit with a constant

Straight-line fit and de-trending

fit data (1, x1 ), . . . , (T, xT ) with affine model xt a + bt

b is called the trend

a + bt is called the trend line

de-trending a time series means subtracting its straight-line fit

de-trended time series shows variations above and below the

Straight-line fit on Apple log price

Periodic time series

let P -vector z be one period of periodic time series

express as xper = Az with

Extracting a periodic component

given (non-periodic) time series x, choose z to minimize kx Azk2

gives best least-squares fit with periodic time series

simple solution: average periods of original:

e.g., to get z for January 9, average all xi s with date January 9

Periodic component of Melbourne temperature

Extracting a periodic component with smoothing

can add smoothing to periodic fit by minimizing

> 0 is smoothing parameter

D is P P circular difference matrix

is chosen visually or by validation

Choosing smoothing via validation

train model on train set, and test on the test set