Professional Documents
Culture Documents
Peter Bartlett
M-Estimators.
Consistency of M-Estimators.
Nonparametric maximum likelihood.
1
M-estimators
Mn () = Pn m .
2
Z-estimators
n () = Pn = 0.
These are estimating equations. van der Vaart calls this a Z-estimator (Z
for zero), but its often called an M-estimator (even if theres no
maximization).
Example:
maximum likelihood:
3
M-estimators and Z-estimators
= arg max Pn m
1 [ [0, ]]
= arg max Pn log
= max Xi .
i
4
M-estimators and Z-estimators: Examples
Mean:
m (x) = (x )2 .
(x) = (x ).
Median:
m (x) = |x |.
(x) = sign(x ).
5
M-estimators and Z-estimators: Examples
Huber: [PICTURE]
m (x) = rk (x )
1 2
2 k k(x + k) if x < k,
rk (x) = 21 x2 if |x| k,
1 2
2 k + k(x k) if x > k.
(x) = [x ]kk
k if x < k,
[x]kk = x if |x| k,
k if x > k.
6
Consistency of M-estimators and Z-estimators
P
We want to show that 0 , where approximately maximizes
Mn () = Pn m and 0 maximizes M () = P m . We use a ULLN.
7
Proof
Pr(d(n , 0 ) )
Pr(M (0 ) M (n ) )
= Pr(M (0 ) Mn (0 ) + Mn (0 ) Mn (n ) + Mn (n ) M (n ) )
Pr(M (0 ) Mn (0 ) /3) + Pr(Mn (0 ) Mn (n ) /3)
+ Pr(Mn (n ) M (n ) /3).
Then (1) implies the first and third probabilities go to zero, and (3) implies
the second probability goes to zero.
8
Consistency of M-estimators and Z-estimators
9
Example: Sample median
Pn (X) = Pn sign(X ).
Suppose that P is continuous and positive around the median, and check the
conditions:
1. The class {x 7 sign(x ) : R} is Glivenko-Cantelli.
10
ULLN and M-estimators
11
Non-parametric maximum likelihood
12
Hellinger distance
We have
1
Z 2
h(p, q)2 = p1/2 q 1/2 d
2
1
Z
= p + q 2p1/2 q 1/2 d
2
Z
= 1 p1/2 q 1/2 d.
This latter integral is called the Hellinger affinity. Expressing h in this form
can simplify its calculation for product densities. Notice that, by
Cauchy-Schwartz,
Z Z Z
p1/2 q 1/2 d p d q d = 1,
13
Non-parametric maximum likelihood
14
Non-parametric maximum likelihood
p0
Z
dKL (
pn , p0 ) = log p0 d
pn
p0 p0
Z
log p0 d Pn log
pn pn
p0 p0
= P log Pn log
pn pn
kP Pn kG ,
where the first inequality follows from the fact that pn maximizes Pn log p
15
over p P, and the class G is defined as
p0
G = 1[p0 > 0] log :pP .
p
16
Non-parametric maximum likelihood
One problem here is that log(p0 /p) is unbounded, since p can be zero.
Well take a different approach: For any p P, consider the mixture
p + p0
p = .
2
If the class P is convex and pn , p0 P, this mixture has
Pn log p Pn log pn . This is behind the following lemma.
Lemma: Define
pn + p0
pn = .
2
If P is convex,
pn
Z
p n , p 0 )2
h( d(Pn P ).
pn
17
Non-parametric maximum likelihood
pn , p0 )2 kP Pn kG ,
h(
where
2p
G= :pP .
p + p0
18
Non-parametric maximum likelihood: Example
p(x)
1. For all p conv P, p(y) 1 Lkx yk.
2p
2. For all p, p0 conv P, p+p0 is O(L2 )-Lipschitz wrt k k.
as
3. kP Pn kG 0, where
2p
G= : p conv P .
p + p0
19
Non-parametric maximum likelihood: Example
But notice that the dependence on the dimension d is terrible: the rate is
exponentially slow in d. The Lipschitz property is a very weak restriction.
20