6 Regression Analysis

6.
Regression analysis
Otavio R de Medeiros
Regression analysis
Introduction A regression is a model of the relationship between one variable on one side and one or more variables on the other side Regression analysis constructs and tests a mathematical model of the relationship between one dependent (endogenous) variable and one or more independent (exogenous) variables. The direction of causality between the variables is determined from a priori information (e.g. theory) and embodied in the model by way of hypothesis The regression analysis tests the statistical strength of the model as hypothesized E.g. the level of the FTSE 100 is linearly dependent on the SP 500. We can test this hypothesis using simple linear regression (Fig. 6.1) Regressions can be simple (2 variables) or multiple (> 2 variables) There are 3 types of regression:
Time series Cross section Panel data
Otavio R de Medeiros 2
Regression analysis Y = dependent variable Regression line: Y = E + FX outlier
0
X = independent variable
3
Regression analysis Simple linear regression:
Y !E FX e
where Y = dependent variable; X = independent variable; E = constant or intercept; F = slope or regression coefficient; e = random error or disturbance term The error term exists because there are other unobserved and unknown effects not included in the regression We cannot infer causality Regression analysis cannot prove a hypothesis, it can only support or not support the hypothesis formulated
Regression analysis Ordinary least-squares (OLS) regression To test the relationship between Y and X it is necessary to derive the values of E, F and e by using a method that is Best, Linear, Unbiased Estimator (BLUE)
Best: most efficient, i.e. smallest variance Linear Unbiased: E(E) = E; E(F) = F
Ordinary least squares (OLS) minimizes the sum of squares of e, i.e. minimize 7 e2 If the data complies with the assumptions which will be seen later, OLS gives the BLUE, i.e. it gives the straight line that best fits the data by calculating the line that minimizes the sum of squares of the errors between Y and Y (Fig. 6.2)
Regression analysis Statistical assumptions of OLS regression: 1. The mathematical formula of the relationship between the true dependent variable Y and the independent variable X is:
Y !E F
Estimated model:
Y !E FX
e
2. The error term e is normally distributed with zero mean and constant variance W2, i.e. e ~ N(0,W2) 3. The successive error terms are independent of each other, i.e. cov eiej = 0 4. X is non-stochastic (or exogenous, please explain...)
Regression analysis Normality is also known as Gaussianity, i.e. having a Gaussian distribution. If e has constant variance W2, this is called homoscedasticity If the variance of e is not constant, this is called heteroscedasticity, i.e. the opposite of homoscedasticity If cov eiej = 0, the residuals e is called non-autocorrelated or non-serially correlated. This assumption means that the factors that caused one value of Y to show error does not automatically cause all the observations of Y to show error When the e values are independent the data are said to be nonautocorrelated
Regression analysis As Y is related to e in a linear form, Y itself is a random variable For any values of X, Y will be ~ N(Q, W2) and therefore the statistical distribution of Y can be fully described by its mean and variance The expected value (mean) of Y:
Yi ! E F ! E F X i E (ei )
But since E(ei) = 0
ei
E (Yi ) ! E (E F X i ei ) ! E (E ) E ( F X i ) E (ei ) !
E (Yi ) ! E F X i
Regression analysis As the expected value of e = 0, the variance of Y, which is also the variance of e, is the mean value of e2, i.e. 7(ei 0)2/n = 7(ei 0)2/n = E(e2i) = W2 Thus Y ~ N(E + FX, W2) This can be seen in Fig. 6.3.
Regression analysis The variance of Y: Given the regression model
Yi ! E F X i ei
If we take variances on both sides, we get
Var (Yi ) ! Var (E F X i ei ) ! Var (E ) Var ( F X i ) Var ( ei ) ! (ei e ) 2 (ei 0) 2 (ei ) 2 ! Var (ei ) ! ! ! ! E (ei2 ) ! W 2 n n n
Thus Yi ~ N(E + FXi, W2)
Regression analysis Fitting the regression line The values of E and F that minimize 7e2 are

cov F! var
[( ! (
)(Y Y )] )2
E !Y F
The error terms (residuals) are
ei ! Yi Y
Where Yi is the observed value of Y and Y is the estimated value of Y.
11
Regression analysis Demonstration:

SS ! (Y Y ) 2 ! (Y E F X ) 2 x ( SS ) ! ( 2(Y E F X )) xE x ( SS ) ! ( 2 X (Y E F X )) xF
SS is minimized when the partial derivatives are set to zero, i.e. (2(Y E F X )) ! 0 This is achieved when
(2 X (Y E F X )) ! 0
Y ! nE F Y !E F
2
12
Regression analysis This is a simultaneous equation problem Multiply the 1st equation by 7X and the second by n
X Y ! nE X F ( X )2 n XY ! nE X nF X 2
Subtracting 1st equation from the 2nd gives
n Y @
Y ! nF
F!
n
F (
) 2 ! F (n
( ) 2 )
n Y
2
( ) 2
Y
13
Regression analysis Since
X ! nX
F!
and
Y ! nY XY nXY ! X nX
2 2 XY X
n XY nXnY n X 2 (nX )2
[( X X )(Y Y )] ! cov F! var (X X )

2
14
Regression analysis Since
Y ! nE F
1 1 Y !E F n n
Dividing by n
@Y ! E F
Solving for E:
E !Y F
15
Regression analysis Example: table 6.1 (page 189) and page 194
cov XY F! ! varX
[( X X )(Y Y )] ! 644387.5 ! 5.964 108046.7 (X X )

2
E ! Y F X ! 2530.74 5.964 v 391.4187 ! 196.3298
Y ! 196.3298 5.964 X
16
Regression analysis Interpretation of the regression equation: Y

Y ! 196.3298 5.964 X
Tan U = 5.96 (slope)
196.33 (intercept)
17
Regression analysis Significance tests of coefficients As shown in Fig. 6.3., calculating the regression coefficients gives single estimates of Y The estimated regression coefficients are also assumed to come from a normal distribution We need to know the statistical significance of these coefficients, by testing that the regression coefficients are significantly different from zero. The statistical significance of the coefficient is measured by the degree of dispersion around the estimated value As the errors or residuals are assumed to be ~ N (Q, W2), the standard deviation of the errors is used to measure that dispersion These standard deviations are called standard errors of the coefficients
Regression analysis Significance tests of coefficients We use the t-statistics to indicate the degree of significance of the coefficients To derive these measures we need to know:
The sampling distribution of those coefficients Estimates of their variances and their standard deviations
We can perform test of hypotheses concerning the coefficients or construct confidence intervals for them
19
Regression analysis Sampling distribution

The sampling distribution of E is
W 2 X 2 E ~ N E , n ( X X ) 2
The sampling distribution of F is
W2 ~ N F, F ( X X )2
20
Regression analysis The estimated variances and standard deviations
s !
2
ei2 n2
The standard errors

SE of E
SE (E ) !
(Y Y ) X n2 n ( X X )
2 i i 2 i
2 i
SE of F
SE ( F ) !
(Y Y )
i i
n2 ( X i X )2
21
Regression analysis For data with a normal distribution, the difference between a variable and its mean divided by the estimate of its standard deviation has a t-distribution The probability statements are:
E E P tn 2,c / 2 e e tn 2, c / 2 ! 1 c SE (E ) F F P tn 2,c / 2 e e tn 2, c / 2 ! 1 c SE ( F )
From these equations we derive the confidence intervals

P E tn 2,c / 2 SE (E ) e E e E t n 2, c / 2 SE (E ) ! 1 c P F tn 2,c / 2 SE ( F ) e F e F tn 2, c / 2 SE ( F ) ! 1 c
where c = probability of the variable being in the tail of the tdistribution

Regression analysis
Thus we have 1-c probability that the true value of the coefficients falls within the range specified. If that range includes zero, the coefficients are not statistically significantly different from zero.
23
Regression Analysis Estimating the Variance of the Disturbance Term
The variance of the random variable ut is given by Var(ut) = E[(ut)-E(ut)]2 which reduces to Var(ut) = E(ut2) We could estimate this using the average of ut2:
1 s ! ut2 T
2
Unfortunately this is not workable since ut is not observable. We can use the sample counterpart to ut, which is ut : 1 2 2
s !
u T
But this estimator is a biased estimator of W2.
24
An unbiased estimator of W is given by s !
2 t
T 2
where
2 t
is the residual sum of squares and T is the sample size.
25
Derivation of the OLS standard error estimator:

F ! ( X ' X )1 X ' y ! ( X ' X )1 X '( X F u ) ! ( X ' X )1 X ' X F ( X ' X )1 X ' u ! ! F ( X ' X ) 1 X ' u Var ( F ) ! E[( F F )( F F ) '] ! E[( F ( X ' X ) 1 X ' u F )( F ( X ' X ) 1 X ' u F ) '] ! ! E[(( X ' X ) 1 X ' u )(( X ' X ) 1 X ' u ) '] ! E[( X ' X ) 1 X ' uu ' X ( X ' X )1 ] ! ! ( X ' X ) 1 X ' E (uu ') X ( X ' X )1 But E (uu ') ! s 2 I @Var ( F ) ! ( X ' X ) 1 X ' s 2 IX ( X ' X ) 1 @ Var ( F ) ! s 2 ( X ' X )1
26
Example
Example: The following model with k=3 is estimated over 15 observations: and the following data have been calculated from the original Xs.
30 2.0 35 10 . . . ( X X ) 1 ! 35 10 65 ,( X y) ! 2.2 , u' u ! 10.96 . . . 0.6 10 65 4.3 . .
y ! F 1 F 2 x 2 F 3 x3 u
Calculate the coefficient estimates and their standard errors. To calculate the coefficients, just multiply the matrix by the vector to obtain X ' X 1 X ' y . To calculate the standard errors, we need an estimate of W2.
2 ! RSS ! 10.96 ! 0.91 s
Tk
15 3
27
The variance-covariance matrix of F is given by

183 320 0.91 . . s2 ( X ' X ) 1 ! 0.91( X ' X ) 1 ! 320 0.91 594 . . 0.91 594 393 . .
The variances are on the leading diagonal:
Var ( F1 ) ! 183 . SE ( F1 ) ! 1.35 Var ( F2 ) ! 0.91 SE ( F2 ) ! 0.96 Var ( F ) ! 3.93 SE ( F ) ! 1.98

3 3
We write:
y ! 1 .10 4 .40 x 2 t 19 .88 x3t
1.35 0.96 1.98

A Special Type of Hypothesis Test: The t-ratio
Recall that the formula for a test of significance approach to hypothesis testing using a t-test was
F F i* test statistic ! i SE F i
If the test is H0 : Fi = 0 H1 : F i { 0 i.e. a test that the population coefficient is zero against a two-sided alternative, this is known as a t-ratio test: F i Since F i* = 0, test stat !
SE ( F i )
The ratio of the coefficient to its SE is known as the t-ratio or t-statistic. Otavio R de Medeiros 29
The t-ratio: An Example
In the last example above: Coefficient SE t-ratio
1.10 1.35 0.81 = = =
-4.40 0.96 -4.63 12 d.f. 2.179 3.055
19.88 1.98 10.04
Compare this with a tcrit with 15-3 (2% in each tail for a 5% test)
5% 1%
Do we reject H0: H 0: H 0:
F1 = 0? F2 = 0? F3 = 0?
(No) (Yes) (Yes)

30
Regression analysis Hypothesis testing:

The regression equation is frequently created to test a hypothesis. This is achieved by setting up the null hypothesis H0 that the coefficients are not statistically significantly different from zero, e.g.
H0 :E ! 0 H1 : E { 0 H0 : F ! 0 H1 : F { 0
To test these hypotheses, we need to calculate the t-statistics for the coefficients:
E SE (E ) F tF ! SE ( F ) tE !
31
Regression analysis It is usual to test for statistical significance at the 95% or 99% level of confidence. That means that there is 95% or 99% probability that the values of E and F are not due to chance. The probability distribution of the t-statistics is a t-distribution with n-2 degrees of freedom. Degrees of freedom refers to the number of pairs of data points used in the regression. The regression coefficients are significant if the t statistic is greater than the critical value given in the t distribution tables From the book example (page 198), F= 5.964 and SE(F) = 0.3476, hence t = 5.964/0.3476 = 17.1577 The test statistic for E is t = 196.3298/136.991 = 1.4332 95% confidence intervals:
for E :196.3298 2 v136.991 e E e 196.3298 2 v136.991 p 77. 65 e E e 470.31 for F : 5.964 2 v 0.3476 e F e Otavio R devMedeiros p 5.27 e F e 6.66 5.964 2 0.3476
32
Regression analysis A one-tailed test or a two-tailed test? We have to decide whether the significance test will be a 1tailed or a 2-tailed test This decision is made before the regression results are known The choice is determined by the theory underlying the model between X and Y which the regression is testing E.g.: if a theory says that the slope of the relationship between X and Y should be greater than one, our test should be
H0: F = 1 H1: F > 1
If we reject H0, it means that the model is according to theory
33
Regression analysis Goodness of fit: the coefficient of determination R2
Y
Y
Y
X
Regression analysis Goodness of fit: the coefficient of determination R2 . . . . . .. . . . . . . . .. .
The total sum of squares (SST) is the sum of the squared differences between Y and Y , i.e. SST ! (Yi Y ) 2 The sum of squares due to the regression is the sum of the squared differences between Y and Y SSR ! (Yi Y )2 The sum of squares due to the error is the sum of the squared differences between Y and Yi SSE ! (Y Y ) 2
35
Regression analysis SST=SSR+SSE The ratio between SSR and SST gives the proportion of the variation in Y explained by the variation in X and is referred to as R2 = coefficient of determination or goodness of fit
SSR 2 R ! ! SST
(Yi Y ) 2 (Yi Y ) 2
36
Regression analysis
SSR 2 R ! ! SST @ R2 (Yi Y ) 2
2 i
(Y Y ) SSE (Y Y ) ! 1 ! 1 SST (Y Y )
i i i
but SSR ! SST SSE

2 2
! 1
(Yi Y ) 2
ei2
! 1
e 'e ( Y - Y v i) '( Y - Y v i )
where i = column vector of 1s.
If the regression is so good that all the points lie exactly on the regression line, then Yi ! Yi In this case we would have R2 = 1, i.e. a perfect regression If the regression is very bad, the regression line will be the mean, i.e. Yi ! Y @ (Yi Yi ) 2 ! (Yi Y ) 2 Hence, R2 will have a value ranging from 0 to 1, i.e. 0 < R2 < 1
Regression analysis If R2 is multiplied by 100 and expressed as a percentage, it represents the proportion of the variation in Y that is explained by the variation in X. R2 is a random variable and it has a F distribution 2 The test statistic is
Fk 1,n 2 ! (k 1) 1 R2 n2
The test has 1 degree of freedom in the numerator and n-2 DFs in the denominator
38
Regression analysis Example FTSE x S&P500: R2 = 0.8548; n = 52; DF = 1;50

H0: R2 = 0 H1: R2 > 0
The test statistic is:
F1,50
R2 ! 1 R2 n2
0.8548 ! ! 294.4 1 0.8548 50
From the F table, we see that the 5% critical value for v1 = 1 and v2 = 50 is 4.03 As the value of the test statistic (294.4) is greater than 4.03, we reject the null that R2 = 0
39
Regression analysis Using regression for prediction The prediction interval: The results of applying the OLS model can be used for prediction E.g. suppose that we wish to predict the level of the FTSE 100 if the SP 500 rose to 550. The predicted value would be:
Y ! 196.3298 5.963972 v 550 } 3477

We want to know the degree of confidence to place on that estimated value. For this purpose we calculate the standard error of the estimate and then the prediction interval
40
Regression analysis Using regression for prediction (SKIP) The standard error of the estimate = standard error of the regression is
s !
2
ei2 n2
Y Y !
i i
n2
The prediction interval would be calculated as
1 ( X * X )2 Y s t99 v s 1 n ( X i X )2
where 99 indicates the level of confidence and X* is the value used in the prediction, i.e. 550
Regression analysis The standard error of the regression is s = 114.27. The prediction interval is
X * X )2 s t v s 1 1 ( Y 99 ! 2 n (Xi X ) (550 391.42) 2 ! 3476 s 2.5 v114.27 1 0.0192 ! 108046.7 ! 3476 s 319.65
Thus we can consider with 99% confidence that if the S&P 500 rises to 550, the FTSE 100 will rise to 3476 +/ 320, i.e. between 3156 and 3796
Regression analysis
Spurious regressions: economic and financial time series are usually nonstationary variables (they trend over time and have unit roots). Regressions with non-stationary variables are not valid (spurious). There are tests to check for unit roots, the most popular being the ADF (Augmented Dickey-Fuller) and the PP (Phillips-Perron) tests. Unit roots are eliminated by differencing the variables
Non-stationary variable
Stationary variable
43
Regression analysis
Multiple regression: a regression model incorporating several independent variables is known as a multiple regression, i.e. Y ! E F1 X 1 F 2 X 2 ... F n X n e The true relationship is unknown and we have to estimate
Y ! E F1 X 1 F 2 X 2 ... F n X n
The Fs are the partial derivatives of Y w.r.t the Xs, i.e.
xY xY xY ; F2 ! ; Fn ! F1 ! xX 1 xX 2 xX n
Regression analysis Computer packages (Eviews, SPSS, RATS, etc) are used to solve multiple regressions. Example of results given by software (data in Appendix 6.2, n = 51):
Y ! 0.215 0.209 X 1 0.934 X 2 0.302 X 3
( 0.39) (1.02) (6.42) ( 2.54)
R 2 ! 0.52; R 2 ! 0.49; DW ! 2.3; F ! 26.0 t statistics in parenthesis
The assumptions for the multivariate OLS are the same as for the univariate model However, the multivariate model has the additional assumption that the independent variables are independent of each other, i.e. cov(xj,xk) = 0 j { k
Regression analysis
Interpretation of results:
Y ! 0.215 0.209 X 1 0.934 X 2 0.302 X 3

( 0.39) (1.02) (6.42) ( 2.54)
R 2 ! 0.52; R 2 ! 0.49; DW ! 2.3; F ! 26.0 t statistics in parenthesis

The t-statistics for each independent variable are interpreted in exactly the same way as earlier, but the t-distributions have n 1 k degrees of freedom, where k = number of independent variables If there are k independent variables there will be k+1 parameters including the constant, this DF = n (k + 1) = n k 1 assumptions for the multivariate OLS are the same as for the univariate model In the example, n = 51, k = 3, DF = 47 The 95% level of confidence (2-tailed) gives t-value > 2.02 and at 99% a tvalue > 2.7 Thus the constant and X1 would not be significant at 95% and X2 would be significant at 99%
Regression analysis Adjusted In multivariate regressions, adding additional explanatory variables will cause R2 to increase. Consequently, R2 must be adjusted to take this into account:
R 2 ! 1 (1 R 2 ) where: n = number of observations k = number of independent regressors n 1 nk
R2
R2 =
Example: 51 1 R ! 1 (1 0.52) ! 0.49223 47

2 Otavio R de Medeiros 47
Regression analysis Test statistic:

R2 n k F! ~ Fk 1,n k 2 (1 R ) k 1 0.52 51 3 F! ! 26 (1 0.52) 3 1
The 1% critical value of the F statistic for 2 DF in the numerator and 48 in the denominator = 5.08 As the decision rule for testing H0 that R2 = 0 is to reject H0 if F > critical value, we reject H0.
48
Regression analysis Chow test for equality of sub-period coefficients (skip)

1. 2. 3. Run the regression over the complete data series and find the SSE1 Run the regression over the separate periods and find SSE2 with n observations and SSE3 with m observations Calculate the Chow statistics
( SSE1 SSE2 SSE3 ) / k ~ Fk ,m n 2 k ( SSE2 SSE3 ) /( n m 2k )
49
Regression analysis Breakdown of the OLS assumptions

Heteroscedasticity Autocorrelation Multicollinearity
50
Regression analysis Heteroscedasticity

If the residuals have a constant variance they are homoscedastic, otherwise they are heteroscedastic The effects of heteroscedasticity are that the regression coefficients are no longer the best minimum variance estimators The consequence of heteroscedasticity on the prediction interval estimation and hypothesis testing is that although the coefficients are unbiased, the variances and standard errors of coefficients will be biased Thus, we may accept the null hypothesis when it should be rejected and vice-versa Test for heteroscedasticity: Goldfeld-Quant test
51
Regression analysis The Goldfeld-Quant test (skip)

1. 2. 3. Divide residuals in 2 groups of n observations, one group with small values and the other with large values The middle one-sixth of observations is removed after sorting in ascending order Compute the test statistic
SSEH GQ ! ~ Fn c ,2 k SSEL
52
Regression analysis Solution to heteroscedasticity:

Observe the relationship between the error terms and transform the regression model in a way that reflects that relationship This may be achieved by regressing the error terms on various functional forms of the variable that causes the heteroscedasticity, e.g.
ei ! E F X iH
X = independent variable that is assumed to be the cause of heteroscedasticity; H = power of the relationship = 2, 1/n, ... The variance of the coefficients becomes E (W i2 ) ! W 2 X iH Thus if H = 1/2, we would transform the regression model to
Yi Xi
! E
Xi
F
ei Xi Yi e E ! F i Xi Xi Xi
53
If H = 2, the transformation would be

Regression analysis Autocorrelation

Occurs when the residuals are not independent of each other because current values of Y are influenced by past values A 1st order autocorrelation process AR(1) is
et ! V et 1 zt
Higher order processes: AR(2): et ! Vt 1et 1 Vt 2 et 2 zt AR(4):
et ! Vt 1et 1 Vt 2 et 2 Vt 3et 3 Vt 4 et 4 zt
54
Regression analysis Test for 1st order autocorrelation: the Durbin-Watson test
et2 To test for autocorrelation we test the following null hypothesis
H0: no autocorrelation if dU e d e 4-dU H1: positive autocorrelation d < dL negative autocorrelation d > 4-dL Inconclusive: dL < d < dU or 4-dU < d < 4-dL
DW !
(et et 1 ) 2
dL
dU
2
4-dU
4-dL
4
55
Regression analysis Example (Brooks)

et2 To test for autocorrelation we test the following null hypothesis
H0: no autocorrelation if dU e d e 4-dU H1: positive autocorrelation d < dL negative autocorrelation d > dU Inconclusive: dL < d < dU or 4-dU < d < 4-dL
DW !
(et et 1 ) 2
dL
dU
2
4-dL
4-dU
4
56
Regression analysis Autocorrelation may be caused by omitted variables or wrong functional form Can also be caused when lagged variables are introduced To solve the autocorrelation problem:
Consider the possibility of and correct for omitted variables or wrong functional form If this is unsuccessful, use the Orcutt-Cochrane procedure (skip):
Calculate the autocorrelation coefficient Change the equation to
V!
(e e e
t t 1 2 t
Yt VYt 1 ! E F ( X t V X t 1 ) et
This will remove the 1st order autocorrelation from the data
Regression analysis Multicollinearity

When some or all independent variables in a multiple regression are highly correlated, the regression model has difficulty separating the effects of each X variable on Y. With multicollinearity the regression coefficients are unreliable. R2 is high but the SEs are also high, so that the coefficients are not significant Possible solutions
Add more data Drop some of the variables that are highly correlated Use factor analysis to transform the highly correlated variables in one single factor
58
Regression analysis Dummy variables

We use dummy variables or dummies when it is necessary to incorporate one or more qualitative variables into a regression Y=E+F1X+F2D+e Y=E+F1X+F2DX+e
Y=E+F1X+e
Y=E+F1X+e
Shift dummy
Slope dummy
59
Regression analysis Example: Henriksson & Merton (1981):

rp rf ! E F ( rm rf ) c[ D(rm rf )] e
rp - rf = excess return to the portfolio rm rf = excess return to the market
60
Regression analysis Non-linear regression

It maybe that the relationship between Y and one or more of the X variables is non-linear Two ways of handling this problem is
Transform the data and apply linear regression Apply non-linear regression techniques
Data transformations
Y=EXF
F"
Y=EFX
Y=EXF
F
ln Y ! ln E F ln X
Y ! E F Z;
Z ! 1/ X
61

6 Regression Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6 Regression Analysis

Uploaded by

Copyright:

Available Formats

6.

Regression analysis Y = dependent variable Regression line: Y = E + FX outlier

Regression analysis Simple linear regression:

Regression analysis The variance of Y: Given the regression model

The error terms (residuals) are

Where Yi is the observed value of Y and Y is the estimated value of Y.

Regression analysis Demonstration:

Regression analysis Since

[( X  X )(Y  Y )] ! cov F! var (X  X )

Regression analysis Since

[( X  X )(Y  Y )] ! 644387.5 ! 5.964 108046.7 (X  X )

E ! Y  F X ! 2530.74  5.964 v 391.4187 ! 196.3298

Regression analysis Interpretation of the regression equation: Y

Tan U = 5.96 (slope)

Regression analysis Sampling distribution

Regression analysis The estimated variances and standard deviations

The standard errors

From these equations we derive the confidence intervals

where c = probability of the variable being in the tail of the tdistribution

Regression Analysis Estimating the Variance of the Disturbance Term

But this estimator is a biased estimator of W2.

Regression Analysis Estimating the Variance of the Disturbance Term

An unbiased estimator of W is given by s !

is the residual sum of squares and T is the sample size.

Regression Analysis Estimating the Variance of the Disturbance Term

Derivation of the OLS standard error estimator:

The variance-covariance matrix of F is given by

The variances are on the leading diagonal:

  Var ( F1 ) ! 183 . SE ( F1 ) ! 1.35   Var ( F2 ) ! 0.91 SE ( F2 ) ! 0.96   Var ( F ) ! 3.93 SE ( F ) ! 1.98

y ! 1 .10  4 .40 x 2 t  19 .88 x3t

1.35 0.96 1.98

A Special Type of Hypothesis Test: The t-ratio

The t-ratio: An Example

In the last example above: Coefficient SE t-ratio

1.10 1.35 0.81 = = =

-4.40 0.96 -4.63 12 d.f. 2.179 3.055

19.88 1.98 10.04

(No) (Yes) (Yes)

Regression analysis Hypothesis testing:

If we reject H0, it means that the model is according to theory

Regression analysis Goodness of fit: the coefficient of determination R2

Regression analysis Goodness of fit: the coefficient of determination R2 . . . . . .. . . . . . . . .. .

but SSR ! SST  SSE

where i = column vector of 1s.

Regression analysis Example FTSE x S&P500: R2 = 0.8548; n = 52; DF = 1;50

The test statistic is:

0.8548 ! ! 294.4 1  0.8548 50

Y ! 196.3298  5.963972 v 550 } 3477

The prediction interval would be calculated as

R 2 ! 0.52; R 2 ! 0.49; DW ! 2.3; F ! 26.0 t statistics in parenthesis

Y ! 0.215 0.209 X 1  0.934 X 2  0.302 X 3

R 2 ! 0.52; R 2 ! 0.49; DW ! 2.3; F ! 26.0 t statistics in parenthesis

Example: 51  1 R ! 1  (1  0.52) ! 0.49223 47

Regression analysis Test statistic:

Regression analysis Chow test for equality of sub-period coefficients (skip)

Regression analysis Breakdown of the OLS assumptions

Regression analysis Heteroscedasticity

Regression analysis The Goldfeld-Quant test (skip)

SSEH GQ ! ~ Fn c ,2 k SSEL

Regression analysis Solution to heteroscedasticity:

If H = 2, the transformation would be

Regression analysis Autocorrelation

Regression analysis Example (Brooks)

[( X X )(Y Y )] ! cov F! var (X X )

[( X X )(Y Y )] ! 644387.5 ! 5.964 108046.7 (X X )

E ! Y F X ! 2530.74 5.964 v 391.4187 ! 196.3298

The variance-covariance matrix of F is given by

Var ( F1 ) ! 183 . SE ( F1 ) ! 1.35 Var ( F2 ) ! 0.91 SE ( F2 ) ! 0.96 Var ( F ) ! 3.93 SE ( F ) ! 1.98

y ! 1 .10 4 .40 x 2 t 19 .88 x3t

but SSR ! SST SSE

0.8548 ! ! 294.4 1 0.8548 50

Y ! 196.3298 5.963972 v 550 } 3477

Y ! 0.215 0.209 X 1 0.934 X 2 0.302 X 3

Example: 51 1 R ! 1 (1 0.52) ! 0.49223 47

SSEH GQ ! ~ Fn c ,2 k SSEL