You are on page 1of 71

Statistical, Econometric and Time Series

Modeling in Finance

Nityanand Misra

Core Quantitative Strategies


Goldman Sachs (India) Securities Private Limited

March 21, 2009


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Outline

1 Random Numbers

2 Regression Models

3 Time Series Models

4 (G)ARCH Models

5 Modeling Process

6 Questions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Outline

1 Random Numbers

2 Regression Models

3 Time Series Models

4 (G)ARCH Models

5 Modeling Process

6 Questions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Expectations

Motivation
Many problems in finance can be reduced to computation of
E[g(X )], where g is a function of the random variable X .

Examples
Pricing a derivative security
Given the distribution f (x) and current price S of a stock,
what is the price of a buy option on the stock at strike K
maturing at time T ?
Hedging and Risk Management
Given a portfolio, what is the 10-day Value-at-Risk (VaR) of
the portfolio?
Given a contract, what is its Maximum Potential Exposure
(MPE)?
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Expectations

Definition
If the probability density function of x is f (x), then
Z
E[g(X )] = g(x)f (x)dx

Theorem (Law of Large Numbers)


If x1 , x2 , . . . , xn are random samples drawn from f (x) then
n
1X
an = g(xi )
n
i=1

1
approximates E[g(X )], the error being of the order of O(n− 2 ).
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Random Number Generation

Random Number Generation Methods


Common methods
Inversion Method
Tranformation Method
Acceptance-Rejection Method
Special methods
Markov Chain Monte Carlo (MCMC)
Difficult-to-sample distributions
Distributions known indirectly
Quasi-Random numbers
Faster convergence than independent random numbers
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Random Number Generation: Common Methods

Inversion of CDF
Given a random deviate u from U(0, 1), one can get a random
deviate x from the CDF F by setting x = F −1 (u), where u is a
random deviate from U(0, 1).

Example
A random deviate from the exponential distribution, whose CDF
is given by F (x) = 1 − e−λx , can be generated from u by taking

log (1 − u)
x =−
λ
Since 1 − u ∼ U(0, 1), one can simply take x = − log u/λ

Problem
Analytical expression for F −1 (u) not always obtainable.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Random Number Generation: Common Methods

Transformation Method
The properties of a distribution can be utilized to transform a
random deviate from one distribution to another.

Example
If u, v are random deviates from U(0, 1) then
p
x = −2 log u cos (2πv )
p
y = −2 log u sin (2πv )

are independent standard normal deviates from N(0, 1).

Problem
Not clear how to use this.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Random Number Generation: Common Methods

Acceptance-Rejection Method
1 Given the PDF f (x), find a PDF g(x) such that f (x) <
Mg(x), ∀x where M > 1
2 Generate x from g(x) and u from U(0, 1)
3 If u < f (x)/Mg(x), accept x as a sample from f (x), else
reject
4 Repeat (2) and (3) if more samples are required

Exercise
Prove that the Unconditional Acceptance Probability (per-
centage of samples accepted from those generated) for the
Acceptance-Rejection Method is 1/M.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Random Number Generation: Common Methods

Example (One-dimensional)

Acceptance-Rejection Method
Image from http://www.mathworks.com
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Random Number Generation: Common Methods

Example (Two-dimensional)

Uniform distribution over a 2 × 2 square and the unit circle


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Random Number Generation: Common Methods

Example (Contd...)
Let’s randomly sample from the uniform distibution over the unit
circle. f (x1 , x2 ) = 1/π if x12 + x22 < 1 and f (x1 , x2 ) = 0 otherwise.

Let g(x1 , x2 ) be the uniform distribution over the 2 × 2 square


centred at origin. g(x1 , x2 ) = 1/4 if max {|x1 |, |x2 |} < 1 and
g(x1 , x2 ) = 0 otherwise.

Taking M = 4/π,
1 Generate (x1 , x2 ) from g (random point in the square)
2 If x12 + x22 < 1, u < f (x)/Mg(x) = 1 so accept. Otherwise
u ≮ f (x)/Mg(x) = 0 so reject.

Works for any M ≥ 4/π


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)


Suitable in situations when impossible to sample using tradi-
tional methods. For example when f (x) is known only upto a
normalizing constant.

Example (Pearson Type IV distribution)


x − λ −m
     
− x −λ
f (x) = k 1 + exp −ν tan 1
a a
where m > 1/2 and a > 0. Numerically computing k inflates the
error of E[g(X )].
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)


Suitable in situations when impossible to sample using tradi-
tional methods. For example when f (x) is known only upto a
normalizing constant.

Example (Bayesian framework)


p(x|θ)π(θ)
π(θ|x) = R
p(x|θ)π(θ)dθ
R
Difficult to compute p(x|θ)π(θ)dθ but knowing p(x|θ) and π(θ),
we know π(θ|x) upto a normalizing constant.

The density π(θ) is called the prior on θ, and p(θ|x) is known as


the posterior on θ.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Markov Chain Monte Carlo

Theorem (Ergodic Theorem)


Given a Markov Chain Xn with the stationary distribution π,
n Z
1X
g(Xi ) −→ g(x)π(x)dx as n −→ ∞.
n
i=1

MCMC Algorithms
Construct an aperiodic and irreducible Markov Chain {Xn }
whose stationary distribution is f (x).
Metropolis-Hastings Sampler
Random Walk and Independent Chain sampling
Requires a proposal distribution
Slice Sampler
When an efficient proposal distribution is difficult to find
Gibbs Sampler
Multivariate sampling using conditional distributions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Quasi-Random Numbers

Pseudorandom Numbers
An uncorrelated sequence from U(0, 1) can have all first n sam-
ples in the interval (0, 0.5), and yet the next sample will have a
probability of 0.5 to lie in (0, 0.5)

Quasi-Random Numbers
Sample the unit hypercube in a highly uniform manner.
Produce a low-discrepancy sequence, with points gener-
ated in a highly correlated manner
Try to produce a sequence with equal number of points in
each sub-cube of uniform partitions of the unit hypercube
Improve convergence propeties of expectations
Examples: Halton, Sobol and Latin Hypercube sequences
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Quasi-Random Numbers

Example (Halton Sequence)

Uniform Random scatter (left) showing the clumping


phenomenon and Quasi-Random scatter (right)
Images from http://www.mathworks.com
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Quasi-Random Numbers

Example (Sobol Sequence)

100 (left) and 1000 (right) points: Sobol sequence


Images from http://www.wikipedia.org
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Quasi-Random Numbers

Example (Sobol Sequence)

10000 points: Sobol (left) and pseudorandom (right) sequence


Images from http://www.wikipedia.org
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Outline

1 Random Numbers

2 Regression Models

3 Time Series Models

4 (G)ARCH Models

5 Modeling Process

6 Questions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Regression

Problem
Given a set of one or more independent variables (explanatory
or predictor variables), can we model or predict a dependent
variable (response variable) with some degree of confidence?

Regression Equation
Let X be an n × p matrix of n observations of p independent
variables, Y be an n × 1 vector of observations of dependent
variables, and β be a p × 1 vector of parameters.

Y = g(X, β) + 

where  is an n × 1 vector of errors (residuals) which are not


explained by the model.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Linear Regression

Linear Regression
The function g(X, β) is linear in parameters (but not necessarily
linear in X).

Example (Arbitrage Pricing Theory)


The actual return on the asset i (ri ) is modeled as

ri = E(ri ) + βi1 f1 + βi2 f2 + · · · + βiK fK + ui

where

E(ri ) = Expected return on asset i


fi = Macroeconomic factor i
βij = Factor loading of asset i on factor j
ui = Idiosyncratic return of asset i
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Linear Regression

Classical Normal Linear Regression Model (CNLRM)


Takes g(X, β) = Xβ and assumes i ∼ N(0, σ 2 ) so that

Yi ∼ N(Xi β, σ 2 )

Widely used in finance and elsewhere


Ordinary Least Squares (OLS) and Maximum Likelihood
(ML) estimators for β are the same,

β̂OLS = β̂ML = (X0 X)−1 X0 Y

OLS (and ML) estimators are the Best Unbiased Estima-


tors, and follow well-known distributions.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Linear Regression

Example (Least Squares Fit)


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Linear Regression

Piecewise Regression
Independent variables have different effects (values of βi ) over
different ranges.
Curve-fitting problems (p = 1), e.g. fitting the term-structure
of interest rates

Spline Regression
Given many points, {(xj , yj )}, and fewer knot abscissas, {xi },
estimate parameters (usually {yi }) such that the resulting spline
interpolation is the model-fit
Different from Spline Interpolation: Given fixed knot points,
{(xi , yi )}, interpolate y at a given x between knots
Cubic Spline Regression Model uses Natural Cubic Splines
Monotone Spline Regression uses monotonic splines
(Monotonic Cubic Hermite splines, M-splines or I-splines)
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Linear Regression

Example (Term-structure fitting)

A Zero Curve ontained by bootstrapping using market data, and


a spline fitted using spline regression
Image from http://www.mathworks.com
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Non-linear Regression

Problems
Modeling non-normal dependent variables
Modeling binary data or probabilities
Modeling censored data

Examples
1 Arrival of buy/sell orders in the market
2 Counterparty default/mortgage prepayment probability
3 Actuarial science (insurance pricing, loss reserving, acci-
dent frequencies)
4 Execution times of limit orders with cancellations
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Non-linear Regression

Problems
Modeling non-normal dependent variables
Modeling binary data or probabilities
Modeling censored data

(Possible) Solutions
Generalized Linear Models (GLM)
Logistic/Probit Regression
Survival Regression
Generalized Linear Mixed Models (GLMM), Generalized
Additive Models (GAM)
Common Estimation methods: ML, Weighted Least Squares
(WLS), Iteratively Reweighted Least Squares (IRLS)
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Non-linear Regression

Generalized Linear Models (GLM)


Relax two assumptions of the CNLRM
1 Distribution of Y is from the Exponential Family
 
yθ − b(θ)
f (y , θ, ψ) = exp + c(y, ψ)
a(ψ)

Depending on a, b and c, the distribution could be


Discrete: Bernoulli, Binomial, Negative Binomial, Multino-
mial, Poisson, Geometric
Continuous: Normal, Exponential, Gamma, Chi-square,
Beta, Dirichlet, Weibull
2 The Link Function g(X, β) = E(Y|X) has the form

E(Y|X) = G(Xβ) = G(η)


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Non-linear Regression

Example (GLM: Bernoulli Distribution)

f (y , µ) = µy (1 − µ)1−y
= exp {y log µ + (1 − y) log (1 − µ)}
   
µ
= exp y log + log (1 − µ)
1−µ


 
µ
And so θ = log ⇔µ= , and
1−µ 1 + eθ

a(ψ) = 1
b(θ) = − log (1 − µ) = − log (1 + eθ )
c(y, ψ) = 0
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Non-linear Regression

GLM: Canonical Link


G is canonical if it is chosen such that Xβ = θ = η.

Link functions for Bernoulli Distribution


1 Logit Model: G (canonical) is the Logistic Distribution CDF

G(z) = ez /(1 + ez )

2 Probit Model: G is the standard normal CDF

G(z) = Φ(z)

3 Log-log Model: G is the Complementary Log-Log Link

G(z) = 1 − exp {− exp (z)}


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Survival Regression

Censored Data Types

Images from http://www.weibull.com


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Survival Regression

Survival Regression
Dependent data has two fields, value and censoring type
Uncensored, Right Censored (suspended), Interval Cen-
sored (inspected) or Left Censored,
Weibull, Lognormal, Log-logistic distibutions common
Two model types: Accelerated Failure Time and Propor-
tional Hazards
Estimation method: ML, which uses the Survival Function

Definition (Survival Function)


For CDF F (X ), the survival function at x, S(x), is the probability
that X > x.
Z ∞
S(x) = P(X > x) = 1 − F (x) = f (x)
x
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Outline

1 Random Numbers

2 Regression Models

3 Time Series Models

4 (G)ARCH Models

5 Modeling Process

6 Questions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Concepts

Basic Concepts
Lag: Forward or backward shift of a time-series.
Lag-operator: Operates on a time-series, LXt = Xt−1 .
Autocorrelation: Correlation of a time-series with itself at a
specific lag.
Autocorrelation Function (ACF): Autocorrelation of a time-series
as a function of lag.
Partial Autocorrelation at lag k: Correlation between Xt and
Xt−k not accounted for by lags 1, 2, . . . , k.
Partial Autocorrelation Function (PACF): Partial Autocorrelation
of a time-series as a function of lag.
Cross-correlation: Correlation between two time-series at a
specific lag.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Concepts

Basic Concepts
Cross-correlation Function (CCF): Cross-correlation of two time-
series as a function of lag.
Stationary Time-Series: A time-series having the same distribu-
tion at different times.
No trend, and no explosion.
Weak-stationarity: First and second moments of a time-series
are same at different times.
Integration: If the d-th difference of a time-series is stationary
the series is said to be integrated of order d, denoted by I(d).
Cointegration: The phenomenon when two time series are non-
stationary but a linear combination of them is stationary.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Concepts

Example (Cross-correlation)

Sample cross-correlation function


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Concepts

Example (Cointegration)

Example of cointegration
Image from http://www.federalreserve.gov/
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Concepts

Non-stationarity

Run sequence plot of original time series (left) showing


non-stationarity and the differenced series (right)
Images from http://www.itl.nist.gov
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Models

Autoregressive (AR) Model


AR model of order p, denoted by AR(p) is defined as
p
X
xt = ϕ0 + ϕi xt−i + t
i=1

In Lag-notation
p
!
X
1− ϕi Li xt = ϕ0 + t
i=1

Properties
1 All-pole Infinite Impulse Response (IIR) filter
2 ACF decays, PACF is cut-off at lag p
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Models

Example (AR Model of second order)

ACF (left)and PACF (right) of the AR(2) model

xt = 0.6xt−1 − 0.08xt−2 + t
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Models

Moving Average (MA) Model


MA model of order q, denoted by MA(q) is defined as
q
X
xt = µ + θi t−i + t
i=1

In Lag-notation
q
!
X
xt = µ + 1+ θi Li t
i=1

Properties
1 Finite Impulse Response (FIR) filter
2 PACF decays, ACF is cut-off at lag q
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Models

Example (MA Model of second order)

ACF (left)and PACF (right) of the MA(2) model

xt = t − 0.6t−1 − 0.3t−2
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Models

ARMA Model
Contains both AR and MA terms. ARMA(p,q) is defined as
p
X q
X
xt = ϕ0 + ϕi xt−i + θi t−i + t
i=1 i=1

In Lag-notation
p q
! !
X X
i i
1− ϕi L x t = ϕ0 + 1+ θi L t
i=1 i=1

Properties
1 ACF and PACF both decay.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Models

ARIMA Model
ARIMA(p,d,q) process is a process whose d-th difference is
ARMA(p,q)
p q
! !
X d
X
i i
1− ϕi L (1 − L) xt = ϕ0 + 1 + θi L t
i=1 i=1

ARMAX/ARIMAX Model
ARMAX(p,q,b) process is an ARMA(p,q) process with b exoge-
nous terms
p
X q
X b
X
x t = ϕ0 + ϕi xt−i + θi t−i + ηi dt−i + t
i=1 i=1 i=1
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Time Series Models

ARMA models in Higher dimensions


VAR, VMA, VARMA, VARMAX, VARIMA and VARIMAX
models
Straightforward extensions of the scalar counterparts
Vector Error Correction Model (VECM)
For cointegrated time series

ARMA models in finance


Directly used to model several financial time series
Return series
Market Microstructure: Bid/ask series
Indirectly used in ARCH/GARCH models for volatility
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

State Space Models

State Space Models


In Control Theory, a state-space model is a system which has
1 An unobeserved/hidden state
2 An observed output
3 An input (also called the control)

Example
Vehicle Navigation
State: Vehicle position and velocity
Observed output: Measured position
Control: Acceleration or deceleration
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

State Space Models

Kalman Filter
A linear state space system which provides best linear
prediction in terms of MSE

Kalman Filter in Finance and Economics


Estimating time-varying parameters in linear regression
models
Modeling forward exchange rate premia
Modeling short term risk-free rate
Estimations in Stochastic Volatility models
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

State Space Models

Kalman Filter

Block diagram of Kalman Filter


Image from http://www.swarthmore.edu
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Outline

1 Random Numbers

2 Regression Models

3 Time Series Models

4 (G)ARCH Models

5 Modeling Process

6 Questions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Volatility

Definition (Volatility)
Volatility is defined as the standard deviation of (periodic) returns
of a security over a specific time period.

More on Volatility
Often used as a proxy for risk associated with the security
Implied Volatility is the volatility implied by market price of an
option, using a theoretical model (e.g. Black-Scholes option
pricing model)
Volatility Clustering is the phenomenon of high and low
volatilities periods occuring in clusters
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Volatility

Volatility plots

Volatility Smile/Skew: Implied volatility versus strike price or


moneyness of an option
Image from http://www.riskgloassary.com
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Volatility

Volatility plots

Term structure: Implied volatility versus maturity of an option


Implied Volatility Surface: 3-D plot combining the volatility
smile/skew and term structure of volatility
Image from http://www.riskgloassary.com
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Volatility

Modeling Volatility
Two common approaches to model the beahviour of volatility
1 Stochastic Approach
Volatility treated as a stochastic process or a diffusion
Calibration is the estimation of the diffusion parameters
Examples: Heston Model, 3/2 Model and Chen Model
2 Econometric Approach
Autoregressive type models used
Volatility and return processes modeled together
Examples: Various types of (G)ARCH Models
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

ARCH/GARCH Models

Definition (Heteroskedastic Variable)


Variable whose variance is not uniform (changes with time).
Conditional heteroskedasticity in stock, bond prices.

Example (Heteroskedastic NASDAQ Returns)


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

ARCH/GARCH Models

ARCH Model
ARCH stands for Autoregressive Conditional Heteroskedasticity.
An ARCH(q) process models the return (rt ) as

rt = µt + t = µt + ht ηt
Xq
ht = α0 + αi 2t−i
i=1

Here µt = E(rt |=t−1 ) is the conditional mean of the return pro-


cess (taken as a constant, an AR, or an ARMA process). t is
called the innovation of the mean process and ηt the standard-
ized innovation (often taken as standard normal). ht = σt2 is the
conditional variance (volatility) of the return process.
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

ARCH/GARCH Models

GARCH Model
GARCH stands for Generalized Autoregressive Conditional Het-
eroskedasticity. A GARCH(p,q) process looks like

rt = µt + t = µt + ht ηt
X q p
X
2
ht = α0 + αi t−i + βi ht−i
i=1 i=1

where the notations are the same as in the ARCH model.


Several variants of GARCH models - GARCH-M, IGARCH,
EGARCH, GARCH-GJR, et cetera
Usually (G)ARCH models estimated using ML
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

ARCH/GARCH Models

NASDAQ returns: Constant mean, GARCH(1,1) model

Inferred innovations (ˆ
t )
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

ARCH/GARCH Models

NASDAQ returns: Constant mean, GARCH(1,1) model

p
Inferred conditional standard deviations ( ĥt )
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

ARCH/GARCH Models

NASDAQ returns: Constant mean, GARCH(1,1) model

Inferred standardized Innovations (η̂t )


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Outline

1 Random Numbers

2 Regression Models

3 Time Series Models

4 (G)ARCH Models

5 Modeling Process

6 Questions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Initial Modeling Steps


1 Pre-processing to get data in appropriate form
Nulls, Categorical or Interaction variables, Panel Data
2 Outlier Filtering to remove influential or noisy points
Mahalanobis Outlier Filter for multivariate data
3 Graphical visualization and quantitative tests
Box plots, Histograms, Scatterplots, Run Sequence Plot
ACF/PACF/CCF for time series
Hypothesis Testing
4 Model Selection
Functional expert’s intuition
Modeler’s experience
Academic literature
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Example (Outlier Treatment)

Regression with and without outlier treatment


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Model Estimation Problem


Given a model and data, how does one estimate the parametrs?

Common Estimation Methods


1 Least Squares (OLS/GLS/WLS/IRWLS)
2 Method of Moments
3 ML, Pseudo ML, Quasi ML, Expectation Maximization (EM)
4 Regularization (for high dimensional sparse regression)
Ridge Regression: Constrains L2 norm of parameter vector
Lasso: Constrains L1 norm of parameter vector
Dantzig Selector: Constrains L1 norm of parameter vector,
minimizing L∞ norm of residuals
5 Bayesian approach
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Post-fit Model Diagnosis


Graphical/quantitative tests for data, estimates, unusual effects
Significance tests: How much confidence do we have in the
fitted model and estimated parameters?
Is the model any better than predicting the mean of Y?
Are assumptions of the model violated?
Do CNLRM residuals have 0 mean and constant variance?
Do results show presence of unusual effects?
Multicollinearity, Cointegration, GARCH effect
Do the values (signs) of parameters make sense?
How well does the model fit the data?
MSE, R-Square, Confidence Intervals and Point forecasts
How does the model compare with other models?
Information Criteria, Likelihoods Ratio Test, Mallow’s Cp
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Example (Underfitting and Overfitting)

Image from http://www.dtreg.com


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Example (Underfitting and Overfitting)

Image from http://cg.postech.ac.kr


Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Cross Validation
Model tested against independent data not used for estimation.
1 Holdout Validation
Training set (insample) and validation set (holdout sample)
Simple to use, but results depends on partitioning
2 Repeated Random Subsampling Validation
Data randomly split several times, results summarized
Proportion not fixed, but results vary every time
3 Leave-one-out Cross-Validation (LOOCV)
Validation set has one sample, others form training set
Repeated for all points, results summarized
Influential outliers detected, but computationally expensive
4 K-fold Cross-Validation
Data divided into K subsets, each made validation set
Repeated K times (folds)
K = N is same as LOOCV
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Modeling Process

Analysis Approaches in Statistics


1 Classical Approach
Problem → Data → Model → Analysis → Conclusions
2 Bayesian Approach
Problem → Data → Model → Prior → Analysis → Conclusions
3 Exploratory Data Analysis (EDA)
Problem → Data → Analysis → Model → Conclusions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Outline

1 Random Numbers

2 Regression Models

3 Time Series Models

4 (G)ARCH Models

5 Modeling Process

6 Questions
Random Numbers Regression Models Time Series Models (G)ARCH Models Modeling Process Questions

Questions?

Thank You!

You might also like