Professional Documents
Culture Documents
Y Dependent variable
1 Intercept or constant
Y = ^Y + the size of the residuals defines how fit your model is!
Why the error term exists Outliers, complex reality (hard to measure things, such as a sunny
day affecting financial markets), using linear model when the phenomena are not linear,
missing/exceeding variables (bias).
Ui = Yi - ^Yi
Propriedades do MQO:
Gauss Markow: OLS estimators are BLUE: best linear unbiased estimator
Other model: Minimize the sum of the absolute value of residuals (less used than MQO).
All 5 assumptions OLS is BLUE! = minimum variance among all unbiased estimators of betas
that are linear functions of Ys
6. Ui is normally distributed, media zero e varincia sigma2 Needed only for inference
With assumption 6: CNLRM; BUE No model better than yours, linear or not.
Hypothesis testing
Confidence interval:
2 st devs: explain more or less 95% of observations. Out of interval reject
The null hypothesis is rejected if the p-value is less than the significance level ().
The level is the probability of rejecting the null hypothesis given that it is true (type I
error) and is most often set at 0.05 (5%)
p-value:
compare it to the level of confidence
o if p-value =< confidence level reject H0
o p-valores altos so compatveis com H0 p-valor > , aceita H0
o obs: diferente, ento se = 5%, 2,5% pra cada lado. ;
o se < ou >,unicaudal
Stat t:
t = 1/Stdev. Shows how many times youre above/below the standard error.
Goodness of Fit
If R2 = 0, it means your model is equal to the average. Models add value when they can
beat the average
R2 Adjusted:
Considers the increase in the number of regressors. Widely used in multiple regression.
o Adding variables that are irrelevant to the model (p-value compatible with H0)
tends to increase R2
S pq o p-valor alto, isso no quer dizer que a varivel irrelevante para o modelo.
E (ui) = 0 (homoskedacity)
Usando p-value: p-valores altos so compatveis com H0. Como p = 0.009, compatvel
com H1, no normalidade!
No caso de agora, alfa = 0,05 e pvalue = 0,03 not normal. Com um alfa de 0,025, aceitaria a
hiptese de normalidade.
No caso da vale, com uma amostra grande (contendo a crise) non-normality. Quando
diminumos a amostra, contendo apenas o perodo de 2003-2007, chegamos a um p-valor de
83%, compatvel com H0 (normalidade!)
- Normality (ui)
- Colinearity of the independent variables
- Var (ui) = CTE
Multiple Regression
Assumption 7 of the Classical Linear Regression Model: No perfect colinearity between the
independent variables. There cant be any exact linear relationship between 2 Xs. Nonlinear,
no problem!
Quando uma varivel independente uma combinao linear de outra, isso um problema !!
No d pra rodar o modelo pois no possvel separar os efeitos dos betas.
Leave the nonsignificant variables: even if a variable is statistically =0, eliminating it can
generate other problems (specially the intercept!!). It could be significant in another sample,
for ex.
- H0: B2 = B3 = 0
- H1: not H0. If any of the parameters is different than zero, we reject the null hypothesis.
There is only one outcome for H1, but it is split into 3 different combinations (both
betas can be different than zero, or just one of them)!
Compare the R2 of both models and stay with the highest, which will present the greater
predictive power!
The problem lies in the comparison between the Rs. Ex: 0,691 x 0,689 Do a statistical test,
dont trust your eyes. Dont use t-statistics (good for means comparison): use F tests!!!
Most powerful way to decide for the best model: compare residual sum of squares (RSS)
instead of R2.
Most papers use RSS!!! If you maximize R2, you are automatically minimizing the residuals.
However, RSS is the most common notation.
RSS always increases when you drop variables from the model (restrict). If you drop an
important variable, it will increase a lot Reject H0 (b2=b3=0)
(excel)
Obs: t tests are individual. They test 1 hypothesis at a time. The F test is a family formed by all
t tests, but you can also use F for individual hypothesis t2 = F !!! The p-value is the same.
Ramsays RESET test: Fitted
Test to check if the relationship between Y and Xs is the same for all the periods.
Test with a JointHyp F test. Low p-value (reject H0) means the parameters are different in
each period, meaning there is structural change
H1: Any of them is different No structural stability, better to use unrestricted model and
break the regression in different periods
Heteroskedacity
One of the assumptions of the CLRM is that the variance of the error term is constant
(homoskedacity). Heteroskedacity = when the variance of the error term is NOT constant,
creating more space between the residuals of the regression.
In heteroskedacity, OLS is not the best anymore (still LUE) since errors in OLS are larger than in
WLS. The change in the StDev creates a bias as it changes the computed t-statistic and
consequently the p-value, meaning it can lead to wrong decisions in hypothesis testing.
There are many ways to impose the weight. A possible rule is to give more weight to
information that is more precise (lower variance).
1
Ex: W= var(ui)
Autocorrelation