Professional Documents
Culture Documents
A plug-in classifier is (
1, ρe1 ( x ) > ρe0 ( x )
f (x) =
0, otherwise.
We will use the following upper bound for the classification error. For a proof see next page.
Lemma 1.
1 1
Z Z
E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] ≤ |ρ0 ( x ) − ρe0 ( x )| + |ρ1 ( x ) − ρe1 ( x )|.
2 2
Now, our classifier first estimates θ, and then approximates the class conditional distributions using
θ̂. Therefore, ρe0 ( x ) is the density of N (−θ̂, σ2 I) and ρe1 ( x ) is the density of N (θ̂, σ2 I). Using Pinsker’s
inequality we have
s
1 Dkl N (θ, σ2 I)kN (θ̂, σ2 I) kθ − θ̂ k2
Z
|ρ0 ( x ) − ρe0 ( x )|dx ≤ =
2 2 2σ
Similarly,
1 kθ − θ̂ k2
Z
|ρ1 ( x ) − ρe1 ( x )|dx ≤
2 2σ
Therefore, we obtain
kθ − θ̂ (( X1 , Y1 ), . . . , ( Xn , Yn ))k2
E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] ≤ E(X1 ,Y1 ),...,(X1 ,Yn ) . (1)
2σ
σ2 I
Now we have E[θ̂ ] = θ and Cov[θ̂ ] = n .
1
Lemma 2.
1 1
Z Z
E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] ≤ |ρ0 ( x ) − ρe0 ( x )| + |ρ1 ( x ) − ρe1 ( x )|.
2 2
Proof. For any classifier f we have
1 1
Z Z
E[1{ f ( X ) 6= Y }] = 1{ f ( x ) = 1}ρ0 ( x )dx + 1{ f ( x ) = 0}ρ1 ( x )dx
2 2
1 1
Z Z
= 1{ f ( x ) = 1}ρ0 ( x )dx + (1 − 1{ f ( x ) = 1})ρ1 ( x )dx
2 2
1 1 1
Z Z
= + 1{ f ( x ) = 1}ρ0 ( x )dx − 1{ f ( x ) = 1})ρ1 ( x )dx
2 2 2
1 1
Z
= + 1{ f ( x ) = 1}(ρ0 ( x ) − ρ1 ( x ))dx
2 2
Therefore, for the classifiers f , f ∗ we have that
1
Z
E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] = (1{ f ( x ) = 1} − 1{ f ∗ ( x ) = 1})(ρ0 ( x ) − ρ1 ( x ))dx
2
Now we prove that