You are on page 1of 3

Cumulative dist. Func.

: discrete: ( < < ) = ( =

() .

). continuous: ( < < ) =


For continuous
CDF: ( = ) = 0, ( ) = ( < ).

Expected Value:
= ()
= ()
Properties of the mean:
(()) = ()
( + ) = () +
(()2 ) = 2 ( 2 )
( + ) = () + ()
( 2 ) 2 ()
( ) = () ()
( )2
() =
2 () =

()
( )2
1 =
2 () =

det()
1
(( ) ( )) ()
(, ) =
= (())

Variance Properties: ( ) = () + () 2(, )


2

2)

( + ) = ()
() = (
Independent variables have: cov(x,y)=0. hence:
( + ) = () + () &
(, ) = ( ) (like variance)
(( 2 ) ( 2 ))
( )2
:
( 2 ) = (
)=
2
2
( ) =
1
1
Plim means
(): (()) = ()
2 ) = ( ) + 2
probability
(
() = ( 2 ) 2
limit. P(x)

2
2
2
( ) = () +
= ( ) + =
as n

2
()
()

+ 2 =
+ 2 = 2 + 2 = 2 / + 2
2
2

(, )
( 2 ) = () + 2 = 2 + 2
= (, )
(( 2 ) ( 2 ))
> 0
2
( ) =
=
1
(,
)

= (, )
(
) ( 2 + 2 ( 2 / + 2 )) = 2
1
< 0
: () = ( 2 ) 2
() = (( )2 ) =
= ( 2 2 + 2 ) = ( 2 ) 2() + 2
= ( 2 ) 22 + 2 = ( 2 ) 2
(, ) = (, )/ (() ())
: (, ) = ()
(, ) = (( )( ))

Physical meaning of
(=0.05) (c.f is 0.95) =
= ( + )
= () () () + probability of type 1
error in HT is 5%
= ()

( ) = (1)(+) det( () ())


For diagonal matrix only, whose entries are aii, A^n = aii ^n
() () , , = 0
(, ) = ()() = = 0
In normal distribution, 68% of data within , 95% within 2

|| 1

((( ) ( )) ) 0
2

(( )2 2( )( ) + 2 ( ) ) 0
() 2 (, ) + 2 () 0
(, )
2 2 (, ) 2 (, )
=
()
+
0
()
()
()
2 (,
2 (,

)
()

1
()
() ()
() () , , = 1
= + ,
(( )( + ))
=
(( 2 ) 2 )((( + )2 ) ( + )2 )
(( ) ( ))
(( )2 )
=
=
=1
(( 2 ) 2 )(2 ( 2 ) 2 2 ) 2 (( )2 )
Law of large numbers: (

) 0

Variance conf. interval: [


F-distribution: =

(1) 2

2
=(1),=0.025
2
2
((1) /1 )/((2)
/2 ).

(1) 2
2
=(1),=0.975

] = 0.95

(1 , 2 ) (2 , 1 )

Test of difference between


Has a distribution: (( ), 2 / + 2 /)
means of two variables
( ) = (1 + 2 + ) = 2
( ) = (1 + 2 + ) = 2
2
( ) = () + (1) () = () + ()
2 2
=
+
= (/) + (/) = 2 /2 + 2 /2

Pooled test : 1) assumes variance of 2 groups is equal 2) allows bigger


degrees of freedom. if given the (s) of two groups, then use (2 ) as variance.
2
Proof : The variance of the pool is Sp2
(
) = ( =1, )
1
2
2

(
(

1)
+

1)

=
, = + 2
2 =
1/2 + 1/2
++2
2 = ( )2 /( 1) ( ) = 2 ( 1) The tests discussed
above are for
2
2 = ( ) /( 1) ( ) = 2 ( 1)
independent groups
x-y
2
2nm The tests discussed
unpooled test: t=
, df=
=
next (paired test
1 1 n+m
+
pooled test) are for
sx2 /n2 +sy2 /m2

related groups
(example of related groups: same group of people before and after drug trial)
1) you must have the values of all 2) calculate the mean of

=
measurements. calculate the difference
the group () and

between each two measurements for 1


unbiased std ( ).
= /
individual. create the group (d = y-x) Calculate = / = 1
All the tests discussed previously are used when the groups are normally
distributed. if the groups are not normally distributed, special tests are used
(i.e: non parametric tests: wilcoxon test)
The previous tests are for continuous variables. For discrete/binary variables:
2
1
R: # rows
( )
= ( ) ( ) 2=( 1)( 1) =

C: # column

Proof: = 2 / + 2 /

Central limit theorem: = 1 1 + 2 2 + 3 3 +..


2
2
, = 1 1 + 2 2 + , 2 = 12 1
+ 22 2
+
Hypothesis testing. use (Z) when n>30, use (t) if n<30

How many std is the measured
=
mean, far from null hypoth.
/
1
( )2
Normal distribution N(m,s2)
() =
exp (
)
has f(x)
2 2
2
Mean's std: get a population, divide it to # many cells, each cell's
name is capital (X), and its size is (n). for each sample (capital X):
() = (1 ) + (2 ) + = 2
Level Signif.
() 2 2
=P(error 1)
( ) = (/) =
= 2 =
=P(error 2)
2

2 ) = ( 2 ),
(

. . = ( ) =
.
, =

(. ) <
Type 1 error: rejected H0
If (n) is fixed: Decreasing a type of
Type 2 error: did NOT reject H0 error increases other. Hence: n
Level of Significance (): lower bound of probability that you
2
ANOVA for HT on >2 groups
cannot reject the possibility of getting the result you get given that 2 =(Fobserved -Fexpected ) /Fexpected

the null hypothesis is true.
This is ANOVA (>2 groups)
This is the ordinary case (2 groups)
=
/
= 1


Compare means of 2 samples, t-test: with df = n1+n2-2
=
=


(1 2 ) (1 2 ) 2 (1 1)12 + (2 1)22
=
; =
(1 )2 + 2 (2 )2 + + ( )2

1
1 + 2 2
)
(
1
2 /1 + 2 /2
=
=
( 1)12 + (2 1)22 + + ( 1)2

For tests on the variance, we use the chi-square and F tests


( 1
)

2
(1) = ( )2 / 2 = 2 = 1
N is population size ( = = 1 + 2 + ) K is how many groups
2
To compare two samples: (2)
= 12 + 22 , = 2
MSbetween groups is due to effect of drug.
df1 for F = N-K
df2 for F = K-1
Remember to convert (S) given in the question to ()=(S) (n-1)/n
MSwithin group is due to chance.
= ( )4 / 4 = 3
2
(1)
= ( 1) 2 / 2 difference from definition above is df
= ( )3 / 3 = 0 = [()2 /6 + ( 3)2 /24]
You can prove it using: (( )2 ) = ( 1) 2
Large n>800 JB has 2 distribution. Otherwise use Bowman-Shelton table
In hypothesis testing on variance, is hypothesized variance
Estimator: A math. function to estimate a population parameter from sample
For Vector A[1xn],,AAT is symmetrical. Diagonal(AAT) is A.^2

Efficiency: ( ) ( ). Example estimator of mean () = 2 /

Good estimator (): Unbiasedness, Efficiency, Consistency, MSE if: ~(0, 2 ) ( ) = 0, ; ( 2 )

2
2
lim ( ) = , and lim ( ) = 0 MSE=variance+bias2 =E(
Unbiasedness: ( ) = Consistency:
is a diagonal matrix with 2 , is BLUE
estimate -) E(any other -)

GM Assumptions: () = 0, () = 2 , () = 0, = ( )1 [ + ] = [ + ( )1 ( )] =
Y: regressand, explained, dependent variable K: # of regressors
= + ( )1 ( )
= (( )1 )1 ( )
x: regressor, independent, carrier variable
(including 1X0) Substitute : = + ( )1 ( (( )1 )1 ( ))
Remember to include a vector of 1's in your (X) matrix for (x0) * Unbiased E(* )=, if (r-R' B* =0)
~(, 2 ( )1 )
() = 0, , ( 2 ) = 2 See the rule above on vector A RMSE: Root mean square Error. MLE: Max Likelihood Estimation
( ) is a diagonal matrix. all elements of diagonal are 2
Unbiased :
2
2
() = ( ) = (( ) ) /( ) RMSE==se
E((2 ) = 2 (), ( ) = 0 ()
Derive the OLS Estimator: our objective function is: min( 2 )
For biased estimator, or inefficient unbiased estimator:

2
min( ) = ( ) ( ) = ( )( )
= ( ) = () + ()2 = 2 / + 0

= + = 2 + = ( )2 = 2 /
() + 2 = (/)
To minimize, find the value of where the derivative = 0
2
2
= (())/ = ()/ = ()/ = 2 /
/ = 2 + 2 = 0 = ( )1
Example in slides: X for predicted & real are matching: divide by n
2 / 2 = 2 0 Function is concave. Optim. is a min.
X for predicted & real are different, use a model, divide by n-K
Proof: Unbiased ( ) = (( )1 ) = ( )1 [ + ] About 68% of the points on a scatter diagram will be within 1RMSE of

-1
=+(X ' X) E(X ' ) (X'X) 0. so If E(X')=0, we get () = the regression line; about 95% of them will be within 2RMSE
2
2
= = /( ) = ; = MLE: Maximizing the likelihood that extracted sample represents the
VCV (Variance-Covariance) () = 2 [ ]1 Proof: VCV( ) population: distribution of y conditional to parameters .
Pdf: f(y|)
Likelihood function: L(|y) (opposite) = ( |)

() = [( )( ) ]
~(, 2 ( )1 )
L(|y)= the binomial distribution: (!/(! ( )!) (1 )
= +
MLE() is the value that maximizes the joint product of densities pdf
([( )1 ][( )1 ] )
([( )1 ( + ) ][( )1 ( + ) ] ) =argmax(L()), or say: find () for max L=L(); L=(|)
( )1
[(( )1 + ( )1 ) ]
( )2
1
)
(
) = (

:
()
=
(2)
exp
(
)

[(( ) + ( ) ) ]
( )
2 2
2

(
=
)
= = n:# observations
( )2
1
( | 2) = ( |{ , , 2 }) = (2) exp (
)

() = () () = /(. . ()) df=N-K


2 2
2
2
OLS Assumptions: 1) linear model parameters. 2) fixed X in
Joint density: ({1 , 2 , }|{, , }) = ( |{ , , })
repeated sampling 3) (, ) = 0 4) () = 0
([1] |{[1] , , 2 }) =

5) () = 2 is constant. 6) N 7) c( , ) 1

([1] [1]
) ([1] [1]
)

: fitted value. SST>SSE>SSR . = ( )2


(2) exp (
)
2
2
. = ( )2
: measured (actual) value
Since (log (ab))=log(a)+log(b), then: = (|) = log((|))
: sample mean. mean of (y) . = ( )2
2
'
= +
(y
Proof: : (ln ({, 2 }|{, })
R =1-((y-y) -y)/(yi -y)'(yi -y))
( ) ( )

R2 represents the proportion 2 = / = 1 /


ln (, 2 |, , ) = ln(2) ln( 2 )
of variability in (Y) that is accounted for by the regression model
2
2
2 2
2 = 1 ((1 2 ) ( 1 )/( 1)) < 2 Maximizing L with respect to is done by minimizing (y-X)'(y-X)
Derive 2 based on MLE
= = ( )1
2 < 2 : ( 1)/( 1) > 1

(
(
Restricted OLS: 1 coefficient t-test, joint test F-test

) )
( ) ( )
=
+
= 0 2 =
4
unrestricted model that we calculated q: # of restrictions ( 2 ) 2 2
2

df = {q , N-K}
Restricted model from hypothesis.
2 = / = ( ) 2 / ( 2 ) = ( /) = ( ) 2 /
Unrestricted model has always better fit
SSRUR < SSRR
Logistic regression: When y is discrete, binary or categorical
( )/ Joint restrictions: If we use OLS with discrete Y, we will suffer: 1) are not normally
,(1=,2=) =
/( ) 1 = 1 distributed, 2) probabilities can be >1 or <0, 3) coefficient
In Joint hypothesis, restrictions can be written as linear system interpretation is meaningless. Hence, we need logistic regression
Odds= p/(1-p)
0<logit<1
When you enforce () to certain values, you can't use the other = ln(/(1 )) = +
ln() =
() from the original model. You have to re-derive the whole
= 1/(1 + exp()) = /(1 + )
= exp()
model. Derive new under restrictions

MLE
for
logit:
solve
{Y-p(Y=1)}X
=0,

Wald
statistic
for = [/(S.E)]2

. : = 0
min( ) = ( ) ( )
2
increase odds of y=1, R=(wald-2)/(-2LL()); (=1)
: = ( ) ( ) 2( )

'
'
'
KRestrct= KUR when xn only increases by 1
Cutoff is: = 1
= (Y ' Y)-2* X ' Y+* X ' X-2(R' * -r ' )
L/* =-2X'Y +2X ' X* -2R'=0
L/=-2(R' '* -r ' ) (LR)=2 [LL(,)-LL()]= 2 [LL(end)-LL(beginning)] 2(=)

Evaluation of logit: 1) model 2 (LR) 2) correct predictions%


3)Mcfadden-R2=1-(LL(,)/LL()) specification errors: incorrect func. form
omitted variablesbias, irrelevant variableinefficiency,
Diagonsis: omitted var: low adjusted R2, t-value, RESET: procedures
1) regress y=b0+b1x1+, 2) regress y=z1y2+z2y3+z3y4+3)HT(F-test)all zi=0
Diagnosis: irrelevant var: different tests on t, F & R2
Diagnosis: functional form: linearity test statistic signific.nonlinear
Measure error in x bias, measure error in y bigger variance
Dummy variables: 0 mean of reference group, mean difference
between n and reference. for (n) categories, use (n-1) different variables
Interactive variable: XD, product of two regressors, exact multicollinear.
Panel data: 2dimension data, x & time. balanced: every x&t combination
Distinct intercept: each individual has distinct intercept fixed over time
Fixed effect model (procedure): demean x: x*=
Use separate for each group
Calculate with x* and y*
() 0 heterogeniety correlated () = 0 heterogeniety is
with x. use FE to remove v
uncorrelated with x. use RE
May face problem when N
=
adjusted SE : = ( )/( )
'
-1
-1
Heteroskedasticity VCV()=E [(-) (-)] =2 (XX ' ) X ' X(XX ')
N
1
F=diagonal().^-0.5
s0 = ( ) 2i xi xi to estimate X E( )X
N
i=1
Heterosked: regressor-regressand relation is not fixed. Estimator still
unbiased and consistent, but inefficient. SE are biased. ( ) = 2
Heterosked tests: goldfeld-quandt, breusch-pagen, white, park| known
Park test: run OLS regression, run regression: ln( 2 ) = 0 + 1 ln(), HT
1=0. if H0 true heterosked. evident; H0 rejected (2 ) = 2
If Unknown , start by plotting vs. xi, use other tests
y=+1 x1 +2 x2

2
2
White: regress: 2 = (0 + 1 + 2 + 1
+ 2
+ 1 2 );
2
Breusch-Pagen: Regress: = (0 + 1 1 + ); : = 0
2
Test statistic (both): Test stat: 2 ~(=)
K TEST regressors (#x)=#-1
E(F'F')=2 FF'
GLS: FF ' =I; =F -1 1 ; F ' F=-1 Fy=FX+F=X
-1 'Y
-1 ' -1

'
'
-1

X
) X
=(X X) X Y VCV(GLS )=2 (X
'X
)-1 =2 (X ' -1 X)
GLS =(X

2GLS =( 'GLS GLS )/(N-K)= -1 /(N-K)

E(
2GLS )=2

N=obsrv.

Autocorrelation: VCV() same as heterosked. above


t =t-1 +ui
t =()i ut-i E(ui)=E(tui)=0, E(uiuj)=2 var(t ) = 2 var(ti ) + var(ut )
AC tests: Durbin-watson, Durbin-a 2 = 2 2 + 2 2 = 2 /(1 2 )
= =2( 1 )2 / =1 2 DW table, or roughly DW=2 NO.
H0: =0, df=(K-1,T). 1 in whole model DW>2 , DW<2 +
Other AC test: Breusch-Godfrey for>1st order AC Different model of
AC remedies

. = (
= + =1 GLS & ARMA

) +
=1
Test: = (=1 ) + + Test-stat: () = 2 ~2
Auto-correlation covar(t ,t-s )=s 2u (1-2 )-1
=
1 =
= (1 2 )1
2
0
0

0
1
-
0

0
1

2
T-1
(1- )
2 -
-
1+

T-2
-
1 0 0
0 1 1+2 -
2

1
T-3
0
- 1 0 0


0 0 - 1
T-1 T-2 T-3
1
0
0 - 1
Violations of GM: 1)( ) = 2 Heteroskedasticity 2)( ) 0 i j are correlated=Autocorrelation 3)E(X)0 Stochastic Regressor, 4)K=rank(x) and KN Multicollinearity

= 0 + 1 1 + 2 2
We have 2 IVs, to be combined to 1
GLS: Heteroskedasticity: guess FGLS
2 stage LS: model is: = +
Autocorrelation: calculate FGLS
= 0 + 1 1 + 2 2
= + + Use regress. , not true x Add vector of 1s to
Multicollinearity: when explanatory variables are correlated
No multicollinearity information available to get , evaluate effect of x on y.
estimates of stay the same if the model was y(x1) or y(x1,x2). marginal
(individual) contribution of one regressor to reducing SSR is independent of
the other regressor (1 ) = (1 |2 )
Multicollinearity existssome info is repeated, redundant, useless
(X'X) is NOT full rank. estimates of i change when you add a new regressor to
the model (higher correlation higher change in i)
SSR(X1)SSR(X1|X2) HT: i=0 may yield different results((t) changes)
SE(x1|model with only x1) < SE(x1|model with multiple x)
symptoms of MC: 1)small changes in observation or regressor causes big
changes in coefficients 2) high SE, small t-statistic of a coefficient while R2 is
high relatively. (high SE of intercept normal)
Diagnosis: for model with 2 regressors only: correlation coefficient ||
Q1: if the explanatory variables are correlated with error terms?
between variables. For model with >2 regressors: 1) variance inflation factor
no, then there is no problem in the stochastic regressor.
(VIF) 2) R2 auxiliary regression > R2 original model
yes, then Q2: if the explanatory variables are correlated with error terms in the same
observation? yesGSRM // no PISRM
MC always exists, the question is: how much MC available
Slides example: y_t=yt-1+t; t=t-1+vt
depends on 1 , yt-1 is related Correlation between X and Y is NOT MC
Auxiliary regression:
to t-1, also t is related to t-1 yt-1 and t must be related (blt3ddi)model is Regress xi against ALL OTHER Xs Xi = 0 + x1 1 + x2 2 +x3 3
GSRM. Other example: = + 1 + , no relation between and Get UNADJUSTED (Ri2)
-1
Bigge VIF, bigger MC
VIF(i )=(1-R2i )
1 . Yt-1 is independent from PISRM model
Critical VIF: 5 or 10. more regressors (K), bigger VIFcritical
Effects of SR: if stochastic but uncorrelated to error: unbiased efficient
VIF Problems: no formal decision rule. Necessary but not sufficient test for
consistent. if PISRM: small sample: biased. large sample: consistent. if
MC. (MC exist with small VIF)
GSRM: always inconistent and biased
Another test of MC is: TOR = 1/VIF = (1-Ri2) TOR=0 perfect corr.
Instrumental variable, for stochastic regressor:
Other diagnosis: new variable causes sign change, decline in R2
Check last example, meat is IV for calorie, asset is IV for age.
z=[ones,meat,asset]. and continued normally using: z'y, z'x
Remedy: if cause is not in the data, change the functional form or sampling
Assume transformation matrix (Z) exists, highly correlated with (x), not
strategy. otherwise, nothing, or drop redundant variable, transform correlated


variables to a new 3rd variable that is a ratio or product, or increase sample
correlated with () ( ) = , ( ) = 0 model becomes

size so: VCV()='/(N-K) is smaller

1
2 =
/
Z'Y=Z'X+Z =
ZY = ( )1
-1
-1
Proof IV
plim ()=plim ((Z' X) Z' Y) =plim ((Z' X) Z' (X+))
consistent
-1
1 (0)
+plim(Z' X/N) plim(Z' /N) = +

=
1
1
2 ( 1 (
1
2
() = = )
)( )1 Procedure:
add a vector of ones to Z, like 1st column in X Evaluate (Z'X),(Z'Y) and
Proof GLS validity: 1st term is: (1,1) 1 =(1,1) 1 '+(1,1) 1
E(1 =F(1,1) 1 )=0 For Remaining terms, yi -yt-1 =(xt -xt-1 )'
+ut Errors are homosked & uncorrelated GLS is BLUE
Heterosked. Example:
Build OLS, check diagonal(') constant
Assume , find F Create: FX, FY. REPLACE (FX(:,1)) BY ONES
Find GLS Find =
Autocorrelation Example:
Build OLS, find Split to sub-vectors: i-1 =[1 ,2 ],i =[2 ,3 ]

)2
Evaluate in: t =t-1 +ui
= =2(1
) / =2(1
Derive F matrix from ugly matrix
Use F to transform x & y
Did not replace FX(:,1) by ones
Stochastic Regressor: ( ) = + ( )1 ( )
If x is no longer fixed in
If X and are not independent of each other
repeated sampling:
(covariance exists)
(( )1 )
(( )1 ) (( )1 )()
GSRM: General Stochastic Regressor
( )1 ()
PISRM: Partially independent stochastic Regressor Model

Violation
Heteroskedasticity
( )

Definition
heterogeneous variance of error
terms

Effect
OLS remains unbiased and consistent
Estimated standard error biased

Diagnosis
Park Test (known ); BreuschPagen test and White test
(unknown )
Durbin-Watson test

Remedy
GLS; FGLS; robust standard error

Auto Correlation
( )

dependence among error terms


(heterogeneous covariance )

OLS remains unbiased and consistent


Estimated standard error biased

Stochastic regressor
() 0

dependence among regressors


and error terms

OLS is biased and inconsistent

PISRM, GSRM, Measure error

IV; 2SLS

Multicollinearity
Rank(X)=KN

dependence among regressors

OLS remains unbiased and consistent


OLS sensitive to change of specification and
data. Difficulty in distinguishing effects from
specific regressors

High standard errors, high R2


Correlation coefficient; VIF; TOR

Non-data-wise: revise sampling strategy, functional


form (specification)
Data-wise: nothing; drop redundant variable, increase
sample size, transform correlated variable

GLS; FGLS; ARMA

You might also like