Lec 3

Lecture 3
3.1
Method of moments.

Consider a family of distributions { : } and and consider a sample X = (X1 , . . . , Xn ) of i.i.d. random variables with distribution 0 , where 0 . We = n (X1 , , Xn ) assume that 0 is unknown and we want to construct an estimate of 0 based on the sample X. Let us recall some standard facts from probability that we be often used throughout this course. Law of Large Numbers (LLN):
Convergence in the above sense is called convergence in probability.
n = X1 + . . . + X n X 1 X n converges to the expectation in some sense, for example, for any arbitrarily small > 0, n X1 | > ) 0 as n . (|X

If the distribution of the i.i.d. sample X1 , . . . , Xn is such that X1 has nite expectation, i.e. | X1 | < , then the sample average
Note. Whenever we will use the LLN below we will simply say that the average converges to the expectation and will not mention in what sense. More mathematically inclined students are welcome to carry out these steps more rigorously, especially when we use LLN in combination with the Central Limit Theorem. Central Limit Theorem (CLT):
If the distribution of the i.i.d. sample X1 , . . . , Xn is such that X1 has nite expectation and variance, i.e. | X1 | < and Var(X ) < , then n X1 ) d N (0, 2 ) n(X
LECTURE 3.
converges in distribution to normal distribution with zero mean and variance 2 , which means that for any interval [a, b],
n X1 ) [a, b] n(X
b a
x2 1 e 22 dx. 2
Motivating example. Consider a family of normal distributions {N (, 2 ) :
, 2 0}.
2 Consider a sample X1 , . . . , Xn N (0 , 0 ) with distribution from this family and suppose that the parameters 0 , 0 are unknown. If we want to estimate these parameters based on the sample then the law of large numbers above provides a natural way to do this. Namely, LLN tells us that
n =X
X1 = 0 as n
and, similarly,
2 2 X1 + . . . + Xn n
2 2 2 X1 = Var(X ) + X 2 = 0 + 0 .
These two facts imply that 2 =

2 2 X1 + + X n X1 + + Xn n n 2
2 X 2 ( X ) 2 = 0 .
2 It, therefore, makes sense to take and 2 as the estimates of unknown 0 , 0 since by the LLN for large sample size n these estimates will approach the unknown parameters. We can generalize this example as follows. Suppose that the parameter set and suppose that we can nd a function g : X such that a function

m( ) =
g (X )
: Im()
has a continuous inverse m1 . Here distribution . Take
denotes the expectation with respect to the
g (X1 + + g (Xn ) = m1 ( g ) = m1 n as the estimate of 0 . (Here we implicitely assumed that g is always in the set Im(m).) Since the sample comes from distribution with parameter 0 , by LLN we have g
0 g (X1 )
= m(0 ).
LECTURE 3.
10
Since the inverse m1 is continuous, this implies that our estimate = m1 ( g ) m1 (m(0 )) = 0 converges to the unkown parameter 0 . Typical choices of the function g are g (x) = x or x2 . The quantity the k th moment of X and, hence, the name - method of moments. Example: Family of exponential distributions E () with p.d.f.
X k is called
p(x) = Take g (x) = x. Then m() =
ex , x 0, 0, x<0
g (X )
1 .
1 ( is the expectation of exponential distribution, see Pset 1.) Let us recall that we can nd inverse by solving for the equation
m() = , i.e. in our case We have, = m1 ( ) = Therefore, we take 1 .
1 = .
) = 1 = m1 ( g ) = m1 (X X as the estimate of unkown 0 . Take g (x) = x2 . Then m() =
g (X
)=
2 . 2
The inverse is = m1 ( ) = and we take
2 2 2 X
2 ) = = m1 ( g ) = m1 (X as another estimate of 0 . The question is, which estimate is better?
LECTURE 3.
11
is consistent if 0 in probability 1. Consistency. We say that an estimate as n .. We have shown above that by construction the estimate by method of moments is always consistent. is asymptotically normal if 2. Asymptotic Normality. We say that 0 ) d N (0, 2 ) n ( 0
2 . where is called the asymptotic variance of the estimate 0
= m1 ( Theorem. The estimate g ) by the method of moments is asymptotically normal with asymptotic variance
2 = 0
V ar0 (g ) . (m (0 ))2
Proof. Writing Taylor expansion of the function m1 at point m(0 ) we have m1 ( g ) = m1 (m(0 )) + (m1 ) (m(0 ))( g m(0 )) + where c [m(0 ), g ]. Since m1 (m(0 )) = 0 , we get (m1 ) (c) )( g m(0 )2 2! Let us prove that the left hand side multiplied by n converges in distribution to normal distribution. m1 ( g ) 0 = (m1 ) (m(0 ))( g m(0 ) + n(m1 ( g ) 0 ) = (m1 ) (m(0 )) (m1 ) (c) 1 ( n( n( g m(0 )) + g m(0 )))2 2! n (3.1) (m1 ) (c) ( g m(0 ))2 2!
Let us recall that g = g (X1 ) + + g (Xn ) , g (X1 ) = m(0 ). n
Central limit theorem tells us that n( g m(0 ) N (0, Var0 (g (X1 ))) where convergence is in distribution. First of all, this means that the last term in (3.1) converges to 0 (in probability), since it has another factor of 1/ n. Also, since from calculus the derivative of the inverse (m1 ) (m(0 )) = 1 m (m1 (m(
0 )))
1 , m (0 )
LECTURE 3.
12
the rst term in (3.1) converges in distribution to (m1 ) (m(0 )) n(m1 ( g ) 0 ) 1 V ar0 (g (X1 )) N (0, Var0 (g (X1 ))) = N 0, m (0 ) (m (0 ))2
is the better is the estimate in the sense that it has smaller deviations from the unknown parameter 0 asymptotically.
What this result tells us is that the smaller
V ar0 (g ) m (0 )

Lec 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 3

Uploaded by

Copyright:

Available Formats

Lecture 3

Convergence in the above sense is called convergence in probability.

Motivating example. Consider a family of normal distributions {N (, 2 ) :

These two facts imply that 2 =

has a continuous inverse m1 . Here distribution . Take

denotes the expectation with respect to the

p(x) = Take g (x) = x. Then m() =

m() = , i.e. in our case We have, = m1 ( ) = Therefore, we take 1 .

) = 1 = m1 ( g ) = m1 (X X as the estimate of unkown 0 . Take g (x) = x2 . Then m() =

The inverse is = m1 ( ) = and we take

2 ) = = m1 ( g ) = m1 (X as another estimate of 0 . The question is, which estimate is better?

Let us recall that g = g (X1 ) + + g (Xn ) , g (X1 ) = m(0 ). n

What this result tells us is that the smaller

You might also like