You are on page 1of 44

Time Series

Karanveer Mohan

Keegan Go

EE103
Stanford University

November 12, 2015

Stephen Boyd

Outline

Introduction

Linear operations

Least-squares

Prediction

Introduction

Time series data

represent time series x1 , . . . , xT as T -vector x

xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T


examples:

average temperature at some location on day t


closing price of some stock on (trading) day t
hourly number of users on a website
altitude of an airplane every 10 seconds
enrollment in a class every quarter

vector time series: xt is an n-vector; can represent as T n matrix

Introduction

Types of time series

time series can be


I

smoothly varying or more wiggly and random

roughly periodic (e.g., hourly temperature)

growing or shrinking (or both)

random but roughly continuous


(these are vague labels)
I

Introduction

Melbourne temperature
I

daily measurements, for 10 years

you can see seasonal (yearly) periodicity


30

25

20

15

10

5
0

Introduction

500

1000

1500

2000

2500

3000

3500

4000

Melbourne temperature

zoomed to one year


25

20

15

10

0
50

Introduction

100

150

200

250

300

350

Apple stock price


I

log10 of Apple daily share price, over 30 years, 250 trading days/year

you can see (not steady) growth


2.5

1.5

0.5

0.5

1
0

Introduction

1000

2000

3000

4000

5000

6000

7000

8000

9000

Log price of Apple

zoomed to one year


0.85

0.8

0.75

0.7

0.65

0.6

0.55

0.5

0.45

0.4

0.35
6000

Introduction

6050

6100

6150

6200

6250

Electricity usage in (one region of) Texas


I

total in 15 minute intervals, over 1 year

you can see variation over year


10

0
0

0.5

1.5

2.5

3.5

4
4

x 10

Introduction

Electricity usage in (one region of) Texas


I

zoomed to 1 month

you can see daily periodicity and weekend/weekday variation


9

1
500

Introduction

1000

1500

2000

2500

10

Outline

Introduction

Linear operations

Least-squares

Prediction

Linear operations

11

Down-sampling

k down-sampled time series selects every kth entry of x

can be written as y = Ax

for 2 down-sampling,

1 0
0 0

0 0

A= . .
.. ..

0 0
0 0

Linear operations

T even,
0
0
0
..
.

0
0
0
..
.

0
0
0
..
.

0
0
0
..
.

0 0 0 0
0 0 0 0

1
0

0
0

0
1

0
1
0
..
.

0
0
0
..
.

0
0
1
..
.

0
0
0
..
.

0
0

12

Down-sampling on Apple log price


4 down-sample
0.17

0.16

0.15

0.14

0.13

0.12

0.11

0.1

0.09
5100

Linear operations

5102

5104

5106

5108

5110

5112

5114

5116

5118

5120

13

Up-sampling

k (linear) up-sampling interpolates between entries of x

can be written as y = Ax

for 2 up-sampling

1
1/2

A=

Linear operations

1/2
1
1/2

1/2
1
..

.
1
1/2

1/2
1

14

Up-sampling on Apple log price


4 up-sample
0.17

0.16

0.15

0.14

0.13

0.12

0.11

0.1

0.09
5100

Linear operations

5102

5104

5106

5108

5110

5112

5114

5116

5118

5120

15

Smoothing

k-long moving average y of x is given by


yi =

1
(xi + xi+1 + + xi+k1 ),
k

can express as y = Ax,

1/3 1/3

1/3

A=

i = 1, . . . , T k + 1

e.g., for k = 3,

1/3
1/3 1/3
1/3 1/3 1/3
..
.

1/3

1/3

1/3

can also have trailing or centered smoothing

Linear operations

16

Melbourne daily temperature smoothed


I

centered smoothing with window size 41


30

25

20

15

10

5
0

Linear operations

500

1000

1500

2000

2500

3000

3500

4000

17

First-order differences

(first-order) difference between adjacent entries

discrete analog of derivative

express as y = Dx, D is the (T 1) T difference matrix

1 1
...

1 1 . . .

D=

.
.
..
..

. . . 1 1

kDxk2 is a measure of the wiggliness of x


kDxk2 = (x2 x1 )2 + + (xT xT 1 )2

Linear operations

18

Outline

Introduction

Linear operations

Least-squares

Prediction

Least-squares

19

De-meaning

de-meaning a time series means subtracting its mean:


x
= x avg(x)

rms(
x) = std(x)

this is the least-squares fit with a constant

Least-squares

20

Straight-line fit and de-trending

fit data (1, x1 ), . . . , (T, xT ) with affine model xt a + bt


(also called straight-line fit)

b is called the trend

a + bt is called the trend line

de-trending a time series means subtracting its straight-line fit

de-trended time series shows variations above and below the


straight-line fit

Least-squares

21

Straight-line fit on Apple log price

Trend
2.5
2
1.5
1
0.5
0
0.5
1
0

1000

2000

3000

4000

5000

6000

7000

8000

9000

6000

7000

8000

9000

Residual
1

0.5

0.5

1
0

Least-squares

1000

2000

3000

4000

5000

22

Periodic time series

let P -vector z be one period of periodic time series


xper = (z, z, . . . , z)
(we assume T is a multiple of P )

express as xper = Az with

IP

A = ...
IP

Least-squares

23

Extracting a periodic component

given (non-periodic) time series x, choose z to minimize kx Azk2

gives best least-squares fit with periodic time series

simple solution: average periods of original:


z = (1/k)AT x,

k = T /P

e.g., to get z for January 9, average all xi s with date January 9

Least-squares

24

Periodic component of Melbourne temperature

30

25

20

15

10

5
0

Least-squares

500

1000

1500

2000

2500

3000

3500

4000

25

Extracting a periodic component with smoothing

can add smoothing to periodic fit by minimizing


kx Azk2 + kDzk2

> 0 is smoothing parameter

D is P P circular difference matrix

1 1

1 1

.. ..
D=
.
.

1
1

is chosen visually or by validation

Least-squares

26

Choosing smoothing via validation

split data into train and test sets, e.g., test set is last period (P
entries)

train model on train set, and test on the test set

choose to (approximately) minimize error on the test set

Least-squares

27

Validation of smoothing for Melbourne temperature


trained on first 8 years; tested on last two years
3

2.95

2.9

2.85

2.8

2.75

2.7

2.65

2.6
2
10

Least-squares

10

10

10

10

28

Periodic component of temperature with smoothing


I

zoomed on test set, using = 30


25

20

15

10

0
2900

Least-squares

3000

3100

3200

3300

3400

3500

3600

29

Outline

Introduction

Linear operations

Least-squares

Prediction

Prediction

30

Prediction

goal: predict or guess xt+K given x1 , . . . , xt

K = 1 is one-step-ahead prediction

prediction is often denoted x


t+K , or more explicitly x
(t+K|t)
(estimate of xt+K at time t)

x
t+K xt+K is prediction error
applications: predict

Prediction

asset price
product demand
electricity usage
economic activity
position of vehicle

31

Some simple predictors

constant: x
t+K = a

current value: x
t+K = xt

linear (affine) extrapolation from last two values:


x
t+K = xt + K(xt xt1 )

average to date: x
t+K = avg(x1:t )

(M + 1)-period rolling average: x


t+K = avg(x(tM ):t )

straight-line fit to date (i.e., based on x1:t )

Prediction

32

Auto-regressive predictor

auto-regressive predictor:
x
t+K = (xt , xt1 , . . . , x(tM ) )T + v
M is memory length
(M + 1)-vector gives predictor weights; v is offset

I
I

prediction x
t+K is affine function of past window xtM :t
(which of the simple predictors above have this form?)

Prediction

33

Least-squares fitting of auto-regressive models

choose coefficients , offset v via least-squares (regression)

regressors are (M + 1)-vectors


x1:(M +1) , . . . , x(N M ):N

outcomes are numbers


x
M +K+1 , . . . , x
N +K

can add regularization on

Prediction

34

Evaluating predictions with validation

I
I

for simple methods: evaluate RMS prediction error


for more sophisticated methods:
split data into a training set and a test set (usually sequential)
train prediction on training data
test on test data

Prediction

35

Example

predict Texas energy usage one step ahead (K = 1)

train on first 10 months, test on last 2

Prediction

36

Coefficients
I
I

using M = 100
0 is the coefficient for today
2

1.5

0.5

0.5

1
0

Prediction

10

20

30

40

50

60

70

80

90

100

37

Auto-regressive prediction results

1
3

3.1

3.2

3.3

3.4

3.5
4

x 10

Prediction

38

Auto-regressive prediction results


showing the residual
0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

1
3

3.1

3.2

3.3

3.4

3.5
4

x 10

Prediction

39

Auto-regressive prediction results

predictor
average (constant)
current value
auto-regressive (M = 10)
auto-regressive (M = 100)

Prediction

RMS error
1.20
0.119
0.073
0.051

40

Autoregressive model on residuals

fit a model to the time series, e.g., linear or periodic


subtract this model from the original signal to compute residuals

apply auto-regressive model to predict residuals

can add predicted residuals back to model to obtain predictions

Prediction

41

Example
I
I

Melbourne temperature data residuals


zoomed on 100 days in test set
8

8
3000

Prediction

3010

3020

3030

3040

3050

3060

3070

3080

3090

3100

42

Auto-regressive prediction of residuals

10

10
3000

Prediction

3010

3020

3030

3040

3050

3060

3070

3080

3090

3100

43

Prediction results for Melbourne temperature

tested on last two years


predictor
average
current value
periodic (no smoothing)
periodic (smoothing, = 30)
auto-regressive (M = 3)
auto-regressive (M = 20)
auto-regressive on residual (M = 20)

Prediction

RMS error
4.12
2.57
2.71
2.62
2.44
2.27
2.22

44

You might also like