You are on page 1of 3

This continues with the Sunspots section of the ARMA Notebook example.

Note: here we consider the raw Sunspot series to match the ARMA example, although many
sources in the literature apply a transformation to the series before modeling.
In [ ]:
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.tsa.setar_model as setar_model
In [ ]:
print sm.datasets.sunspots.NOTE

In [ ]:
dta = sm.datasets.sunspots.load_pandas().data

In [ ]:
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del dta["YEAR"]

In [ ]:
dta.plot(figsize=(12,8));

First we'll fit an AR(3) process to the data as in the ARMA Notebook Example.
In [ ]:
arma_mod30 = sm.tsa.ARMA(dta, (3,0)).fit()
print arma_mod30.params
print arma_mod30.aic, arma_mod30.bic, arma_mod30.hqic

To try and capture nonlinearities, we'll fit a SETAR(2) model to the data to allow for two
regimes, and we let each regime be an AR(3) process. Here we're not specifying the delay or
threshold values, so they'll be optimally selected from the model.
Note: In the summary, the \gamma parameter(s) are the threshold value(s).
In [ ]:
setar_mod23 = setar_model.SETAR(dta, 2, 3).fit()
print setar_mod23.summary()

Note that the The AIC and BIC criteria prefer the SETAR model to the AR model.
We can also directly test for the appropriate model, noting that an AR(3) is the same as a
SETAR(1;1,3), so the specifications are nested.
Note: this is a bootstrapped test, so it is rather slow until improvements can be made.
In [ ]:
setar_mod23 = setar_model.SETAR(dta, 2, 3).fit()

f_stat, pvalue, bs_f_stats = setar_mod23.order_test() # by default tests


against SETAR(1)
print pvalue

The null hypothesis is a SETAR(1), so it looks like we can safely reject it in favor of the
SETAR(2) alternative.
One thing to note, though, is that the default assumptions of order_test() is that there is
homoskedasticity, which may be unreasonable here. So we can force the test to allow for
heteroskedasticity of general form (in this case it doesn't look like it matters, however).
In [ ]:
f_stat_h, pvalue_h, bs_f_stats_h =
setar_mod23.order_test(heteroskedasticity='g')
print pvalue
In [ ]:
setar_mod23.resid.plot(figsize=(10,5));

We have two new types of parameters estimated here compared to an ARMA model. The delay
and the threshold(s). The delay parameter selects which lag of the process to use as the
threshold variable, and the thresholds indicate which values of the threshold variable separate
the datapoints into the (here two) regimes.
The confidence interval for the threshold parameter is generated (as in Hansen (1997)) by
inverting the likelihood ratio statistic created from considering the selected threshold value
against ecah alternative threshold value, and comparing against critical values for various
confidence interval levels. We can see that graphically by plotting the likelihood ratio sequence
against each alternate threshold.
Alternate thresholds that correspond to likelihood ratio statistics less than the critical value are
included in a confidence set, and the lower and upper bounds of the confidence interval are the
smallest and largest threshold, respectively, in the confidence set.
In [ ]:
setar_mod23.plot_threshold_ci(0, figwidth=10, figheight=5);
As in the ARMA Notebook Example, we can take a look at in-sample dynamic prediction and
out-of-sample forecasting.
In [ ]:
predict_arma_mod30 = arma_mod30.predict('1990', '2012', dynamic=True)
predict_setar_mod23 = setar_mod23.predict('1990', '2012', dynamic=True)
In [ ]:
ax = dta.ix['1950':].plot(figsize=(12,8))
ax = predict_arma_mod30.plot(ax=ax, style='r--', linewidth=2,
label='AR(3) Dynamic Prediction');
ax = predict_setar_mod23.plot(ax=ax, style='k--', linewidth=2,
label='SETAR(2;3,3) Dynamic Prediction');
ax.legend();

ax.axis((-20.0, 38.0, -4.0, 200.0));

It appears the dynamic prediction from the SETAR model is able to track the observed
datapoints a little better than the AR(3) model. We can compare with the root mean square
forecast error, and see that the SETAR does slightly better.
In [ ]:
def rmsfe(y, yhat):
return (y.sub(yhat)**2).mean()
print 'AR(3):
', rmsfe(dta.SUNACTIVITY, predict_arma_mod30)
print 'SETAR(2;3,3): ', rmsfe(dta.SUNACTIVITY, predict_setar_mod23)
If we extend the forecast window, however, it is clear that the SETAR model is the only one that
even begins to fit the shape of the data, because the data is cyclic.
In [ ]:
predict_arma_mod30_long = arma_mod30.predict('1960', '2012',
dynamic=True)
predict_setar_mod23_long = setar_mod23.predict('1960', '2012',
dynamic=True)
ax = dta.ix['1950':].plot(figsize=(12,8))
ax = predict_arma_mod30_long.plot(ax=ax, style='r--', linewidth=2,
label='AR(3) Dynamic Prediction');
ax = predict_setar_mod23_long.plot(ax=ax, style='k--', linewidth=2,
label='SETAR(2;3,3) Dynamic Prediction');
ax.legend();
ax.axis((-20.0, 38.0, -4.0, 200.0));
In [ ]:
print 'AR(3):
', rmsfe(dta.SUNACTIVITY, predict_arma_mod30_long)
print 'SETAR(2;3,3): ', rmsfe(dta.SUNACTIVITY, predict_setar_mod23_long)

You might also like