distributions by ρ0 ( x ) = p( x |Y = 0), ρ1 ( x ) = p( x |Y = 1). The Bayes optimal classifier is

(

1, ρ1 ( x ) > ρ0 ( x )

f ∗ (x) =

0, otherwise.

A plug-in classifier is (

1, ρe1 ( x ) > ρe0 ( x )

f (x) =

0, otherwise.

We will use the following upper bound for the classification error. For a proof see next page.

Lemma 1.

1 1

Z Z

E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] ≤ |ρ0 ( x ) − ρe0 ( x )| + |ρ1 ( x ) − ρe1 ( x )|.

2 2

Now, our classifier first estimates θ, and then approximates the class conditional distributions using

θ̂. Therefore, ρe0 ( x ) is the density of N (−θ̂, σ2 I) and ρe1 ( x ) is the density of N (θ̂, σ2 I). Using Pinsker’s

inequality we have

s

1 Dkl N (θ, σ2 I)kN (θ̂, σ2 I) kθ − θ̂ k2

Z

|ρ0 ( x ) − ρe0 ( x )|dx ≤ =

2 2 2σ

Similarly,

1 kθ − θ̂ k2

Z

|ρ1 ( x ) − ρe1 ( x )|dx ≤

2 2σ

Therefore, we obtain

kθ − θ̂ (( X1 , Y1 ), . . . , ( Xn , Yn ))k2

E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] ≤ E(X1 ,Y1 ),...,(X1 ,Yn ) . (1)

2σ

σ2 I

Now we have E[θ̂ ] = θ and Cov[θ̂ ] = n .

dσ2

= tr(E[((θ̂ − θ )(θ̂ − θ )T )]) = tr(σ2 I) =

n

By Jensen’s inequality we get r

d

q

E[kθ̂ − θ k2 ] ≤ E[kθ̂ − θ k22 ] =σ (2)

n

Combining (1) and (2) we obtain

r

∗ 1 d

E[1{ f ( X ) 6= Y }] − E[1{ f ( X ) 6= Y }] ≤

2 n

and finally using the Bayes error that we computed in class we have

r

σ2 1 d

E[1{ f ( X ) 6= Y }] ≤ +

kθ k22 2 n

1

Lemma 2.

1 1

Z Z

E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] ≤ |ρ0 ( x ) − ρe0 ( x )| + |ρ1 ( x ) − ρe1 ( x )|.

2 2

Proof. For any classifier f we have

1 1

Z Z

E[1{ f ( X ) 6= Y }] = 1{ f ( x ) = 1}ρ0 ( x )dx + 1{ f ( x ) = 0}ρ1 ( x )dx

2 2

1 1

Z Z

= 1{ f ( x ) = 1}ρ0 ( x )dx + (1 − 1{ f ( x ) = 1})ρ1 ( x )dx

2 2

1 1 1

Z Z

= + 1{ f ( x ) = 1}ρ0 ( x )dx − 1{ f ( x ) = 1})ρ1 ( x )dx

2 2 2

1 1

Z

= + 1{ f ( x ) = 1}(ρ0 ( x ) − ρ1 ( x ))dx

2 2

Therefore, for the classifiers f , f ∗ we have that

1

Z

E[1{ f ( X ) 6= Y }] − E[1{ f ∗ ( X ) 6= Y }] = (1{ f ( x ) = 1} − 1{ f ∗ ( x ) = 1})(ρ0 ( x ) − ρ1 ( x ))dx

2

Now we prove that

and ρe0 ( x ) < ρe1 ( x ). Therefore, the LHS of inequality (3) is

