You are on page 1of 5

CS 229: Machine Learning

Problem Set 0

William Ma

July 20, 2017


1 Question 1
1a Part a
Given f (x) = 12 xT Ax + bT x where A is a symmetric matrix and and b Rn is a vector,
we can calculate x f (x) by taking the partial derivative
n n n
f (x) h1 X X X i
= xi Aij xj + bi xi
xk xk 2
i=1 j=1 i=1
1h X X X X i
= Aij xi xj + Aik xi xk + Akj xk xj + Akk xk
xk 2
i6=k j6=k i6=k j6=k
n
X
+ bi xi
xk
i=1
1X 1X
= Aik xi + Akj xj + Akk x2k + bk
2 2
i6=k j6=k
n n
1 X 1 X
= Aik xi + Akj xj + bk
2 2
i=1 j=1
n
X
= Aik xi + bk
i=1

Now we can easily see that, if x f (x) = 2Ax + b

1b Part b
Given that f (x) = g(h(x)), where g : R R is differentiable and h : Rn R is
differentiable, we can expand f (x) to arrive at the solution

f (x)
= g(h(x))
xk xk
By invoking Chain Rule,

f (x)
= g 0 (h(x)) h(x)
xk xk
Combining these back into a vector,
0
g (h(x)) x 1 h(x)

f (x) = .. 0
= g (h(x))h(x)

.
g 0 (h(x)) xn h(x)

1
1c Part c
Given f (x) = 12 xT Ax + bT x where A is a symmetric matrix and and b Rn is a vector,
we can calculate the Hessian as follows
n n n
2 f (x) 2 h 1 X X X i
= x i A ij x j + bi x i
x2k x2k 2 i=1 j=1 i=1
2 1h X X X X i
= Aij xi xj + Aik xi xk + Akj xk xj + Akk x2k
x2k 2
i6=k j6=k i6=k j6=k
n
2 X
+ bi xi
x2k i=1
1X 1X
= Aik + Akj + 2Akk xk
2 2
i6=k j6=k
n n
1 X 1 X
= Aik + Akj
2 2
i=1 j=1
n
X
= Aik
i=1

Thus, the 2 f (x) = A.

1d Part d
Given f (x) = g(aT x), where g : R R is continuously differentiable and a Rn is a
vector, we can calculate f (x) using the result we got from problem 1a and 1b

f (x) = g 0 (aT x) (aT x)


= g 0 (aT x)a

However, for the Hessian, we have to expand, apply Chain rule to each term, then
recombine back into a vector.
2 f (x) 2
= xj g(aT x)
xi xj xk xi
n n
00 T X X
= g (a x) ak xk al xl
xi xj
k=1 l=1
00 T
g 00 (aT x)a1 an

g (a x)a1 a1 . . .
00 T
= g (a x)ai aj =
.. .. ..
. . .
g 00 (aT x)an a1 . . . g 00 (aT x)an an
= g 00 (aT x)aaT

Thus, 2 f (x) = g 00 (aT x)aaT .

2
2 Problem 2
2a Part a
nn
Proof. Given z Rn and that A = zz T , A S+ if A = AT and xT Ax 0.
A = AT
zz T = (zz T )T
zz T = (z T )T z T = zz t
Thus, A = AT .
xT Ax 0
xT zz T x 0
(xT z)(xT z)T 0
(xT z)2 0
nn
Thus, since A = AT and xT Ax 0, A S+ .

2b Part b
Given z Rn is a non-zero vector and A = zz t , the null-space of A is 1 since, Ax = 0
only when x is orthogonal to z, which implies that z T x = 0 as shown.
Ax = 0
zz T x = 0
z(0) = 0
Thus, the null-space is 1. Using the rank-nullity theorem, the rank of A is n 1.

2c Part c
Proof. Given A Snn
+ and B Rmn is arbitary,
BAB T = (BAB T )T
BAB T = (B T )T AT B T
BAB T = BAB T
Thus, BAB T = (BAB T )T .
xT BAB T x 0
(xT B)A(xT B)T 0
Since A Snn T T T T T
+ , then yAy 0. We can simply let y = x B for (x B)A(x B) 0 to
T T T T T T mm
be true. Thus, since BAB = (BAB ) and x BAB x 0, BAB S+ .

3
3 Problem 3
3a part a
Proof. Given that A is diagonalizable, such that A = T T 1 , and t(i) Rn is the i-th
column of T ,

At(i) = T T 1 t(i)

The inverse of a matrix, M Rnn multiplied by x(i) , the i-th column of M , returns
always returns a n n matrix, N , where
(
1, if j = i and k = i
Njk =
0, otherwise

Thus,

At(i) = T T 1 t(i) = T (i)


= t(i) i = i t(i)

Thus, At(i) = i t(i) where (t(i) , i ) are the eigenvector/eigenvalue pair of A.

3b Part b
Proof. Given that A is symmetric, A = U U 1 , U is orthogonal, and u(i) Rn is the
i-th column of T ,

Au(i) = U U T u(i)
= U U (1) u(i)

We can use the result we got from problem 3a and get that Au(i) = i u(i) , where (u(i) , i )
are the eigenvector/eigenvalue pair of A.

3c Part c
Proof. Given A Snn
+ and i is an eigenvalue of A,

xT Ax 0
xT U U T x 0
(xT U )(xT U )T 0
nn
Since is a diagonal matrix, S+ , which implies that i 0.

You might also like