You are on page 1of 11

Math 555-Homework 2

Guangri Xue

Problem 1 (4.1). Let f (x) = 10(x2 x21 )2 + (1 x1 )2 . At x = (0, 1) draw the contour
lines of the quadratic model
1
mk (p) = fk + fkT p + pT Bk p (1)
2
assuming that B is the Hessian of f . Draw the family of solutions of
1
minn mk (p) = fk + fkT p + pT Bk p s.t. kpk k (2)
pR 2

as the trust region varies from = 0 to = 2. Repeat this at x = (0, 0.5).

Solutions: Compute

40x1 (x2 x21 ) 2(1 x1 ) 40x2 + 120x21 + 2 40x1


   
f = , 2 f = .
20(x2 x21 ) 40x1 20

p is a solution of (2), if and only if there exists 0, s.t. (B+I)p = f , (kp k) =


0, and B + I 0.
At x = (0, 1),
 
T 42 0
fk = 11, fk = [2, 20] , Bk = .
0 20
 T
2 20
(Bk + I)p = fk , thus p = , .
42 + 20 +

If = 0, kp k = 442/21. If > 0, kp k < 442/21.

[1/21, 1]T , 442/21 2.
p = h iT
2
42+ 20
, 20+ , with kp k = , 0 < < 442/21.

When 0 < < 442/21, we use Newton method to find . See the Figures 1 and 2 for
the contours and the solutions.

1
Figure 1: Contour of mk at x = (0, 1) with Figure 2: Zooming in the left Figure 1
solutions

Figure 3: Contour of mk at x = (0, 0.5) with Figure 4: Zooming in the left Figure 3
solutions

At x = (0, 0.5),
 
7 18 0
fk = , fk = [2, 10]T , Bk = .
2 0 20

Since Bk + I 0, 18. Further we can get kp k = . (Bk + I)p = fk , then


 T
2 10
p = , , with kp k = , 0 2.
18 + 20 +

See the Figures 3 and 4 for the contours and the solutions.

2
Problem 2 (4.2). Write a program that implements the dogleg method. Choose Bk to
be the exact Hessian. Apply it to solve Rosenbrocks function

f (x) = 100(x2 x21 )2 + (1 x1 )2 .

Experiment with the update rule for the trust region by changing the constants in Algo-
rithm 4.1, or by designing your own rules.

Solutions:
400x1 (x2 x21 ) 2(1 x1 ) 400x2 + 1200x21 + 2 400x1
   
2
f (x) = , f (x) = .
200(x2 x21 ) 400x1 200

We use the same initial guess x0 as in Problem 1 in Chapter 3.1.


= 0.1 and = 103 .
When x0 (1.2, 1.2)T . The following is the convergence history.

f(x) ||Grad f(x)|| ferr itnum


1.94e+01 2.29e+02 1.00e+00 0
1.41e+01 1.92e+02 5.27e+00 1
6.26e+00 1.24e+02 7.88e+00 2
4.06e-02 9.15e+00 6.22e+00 3
1.96e-03 1.45e-01 3.87e-02 4
3.46e-04 8.33e-01 1.61e-03 5
9.23e-08 8.86e-04 3.46e-04 6
8.53e-13 4.13e-05 9.23e-08 7
5.64e-25 2.24e-12 8.53e-13 8

Approximate solution: (1.00e+00, 1.00e+00)

For this initial guess, the hessian matrix is always positive definite (checked through the
code) during this iterative process. So dogleg method should work well. In fact x0 is close
to the exact solution. As a result the method behaves like Newtons method. In addition,
kf (x)k decreases monotonically.
With = 0 and = 0.1,

f(x) ||Grad f(x)|| ferr itnum


5.80e+00 1.25e+02 1.00e+00 0
3.11e+00 9.01e+01 2.69e+00 1
2.65e-01 2.49e+01 2.85e+00 2
1.41e-02 3.03e-01 2.51e-01 3
6.26e-03 7.86e-01 7.89e-03 4
1.54e-03 8.56e-01 4.71e-03 5
1.29e-04 3.27e-01 1.41e-03 6

3
1.60e-06 3.03e-02 1.28e-04 7
3.26e-10 5.50e-04 1.60e-06 8
1.39e-17 8.98e-08 3.26e-10 9

One can see that kf (x)k does not decrease monotonically, which verifies the convergence
theory numerically when = 0.
We test the method with a more conservative data: = 0.01 and = 105 . It took
31 iterations instead of 8 before.
Now we take x0 = (1.2, 1)T . In this case, the hessian is not always positive definite.
However, the iteration converges in 190 iterations with = 0.1 and = 103 . Of course,
we can not guarantee that kf (x)k decreases monotonically.

4
Problem 3 (4.6). The Cauchy-Schwartz inequality states that for any vectors u and v,
we have
|(u, v)|2 (u, u)(v, v)
with equality only when u and v are parallel. When B is positive definite, use this inequality
to show that
kgk4
:= 1
(g, Bg)(g, B 1 g)
with equality only if g and Bg (and B 1 g) are parallel.

Proof. First of all, B must be symmetric. If not, the following is a counter example. Take
   1 1

1 1
B= . Then B 1 = 2 2 .
1 1 21 12

Let g = [g1 , g2 ]T . Then, (g, Bg) = g12 + g22 and (g, B 1 g) = 12 (g12 + g22 ). As a result
= 2 > 1.
So B is symmetric positive definite. Let 0 < 1 2 n be the eigenvalues.
Pn And
xi , (i = 1, 2, ..., n), are the corresponding orthonormal eigenvectors. Write g = i=1 ci xi .
We have
Xn Xn X n
2 1 1 2
(g, Bg) = i ci , (g, B g) = i ci , (g, g) = c2i .
i=1 i=1 i=1

By Cauchy-Schwartz inequality,
n
!2 n
!2 n n
1
X X 1 X X
c2i = (i ci )(i 2 ci )
2
i c2i 1 2
i ci
i=1 i=1 i=1 i=1

This yields the desired result.

5
Problem 4 (4.7). When B is positive definite, the double-dogleg method constructs a
path with three line segments from the origin to the full step. The four points that define
the path are

the origin;
(g,g)
the unconstrained Cauchy step pC = (g,Bg) g;
a fraction of the full step pB = B 1 g, for some (, 1], where is defined in
previous question; and
the full step pB = B 1 g.

Show that kpk increases monotonically along this path.

Proof.
pC ,

0 1
p( ) = C B C
p + ( 1)(p p ), 1 2
( + ( 2)(1 )) pB , 2 3

When 0 1, it is obvious kp( )k is monotonically increasing function.


When 1 2, let h() = 21 kp(1 + )k2 .

1 2
h() = kpC k2 + (pC , pB pC ) + kpB pC k2 .
2 2
It is enough to show h0 () > 0 for (0, 1).

h0 () = (pC , pB pC ) + kpB pC k2 (pC , pB pC )


 
(g, g) 1 (g, g)
= g, B g + g
(g, Bg) (g, Bg)
 
(g, g) 1 (g, g)
= (g, B g) (g, g)
(g, Bg) (g, Bg)
(g,g)
Since > , we have (g, B 1 g) (g,Bg) (g, g) > 0. This implies h0 () > 0 for (0, 1).
When 2 3, it is obvious kp( )k is monotonically increasing function.

6
Problem 5 (4.9). Derive the solution of the two-dimensional subspace minimization
problem in the case where B is positive definite.

Solutions:
1
min m(p) = f + g T p + pT Bp kpk , p span[g, B 1 g]
p 2

Let p = g + B 1 g and u = (, )T .
1
J(u) := kpk2 = 2 kgk2 + 2 kB 1 gk2 + 2(g, B 1 g) = uT Bu,
2
here
kgk2 (B 1 g, g)
 
B = 2 .
(B 1 g, g) kB 1 gk2
Easy to see B is symmetric positive definite.
We derive the constraint for u from kpk .

J(u) 2 . (3)

Let h(u) := m(p).


1
h(u) = f + (g + B 1 g, g) + (g + B 1 g, Bg + g)
2
2 2
= f + kgk2 + (B 1 g, g) + (g, Bg) + (B 1 g, g) + kgk2
2 2
T 1 T
= f + g u + u Bu,
2
here
kgk2 kgk2
   
(Bg, g)
g = , B =
(B 1 g, g) kgk 2 (B 1 g, g)
Therefore the problem reduces to solve the following two dimensional problem: under
the constraint (3),
min h(u). (4)
uR2

Easy to check B is symmetric positive and definite. Let u be the minimizer of (4) with
no constraint.
u = B 1 g.
If J(u ) 2 , then u is the solution.
If J(u ) > 2 , solve the following problem: for 0,

h(u) + J(u) = 0

7
This is equivalent to
(B + B)u = g
(B + B) is symmetric positive definite. Then the solution is

(B + B)1 g.

We summarize as following: with the constrain J(u) 2 ,



u , J(u ) 2
argmin h(u) =
(B + B) g, J(u ) > 2
1

here 0 satisfies the following algebraic equation:


 
J (B + B)1 g = 2 .

8
Problem 6 (4.8). Show that
2 
2 ((l) )
 
(l+1) (l) (l+1) (l) kpl k kpl k
= 0 (l) , and = +
2 ( ) kql k

are equivalent.
Proof: From the hint, we have
d 1 1 3 d
02 () = ( ) = (kp()k2 ) 2 kp()k2 , (5)
d kp()k 2 d

and
n
X (g T g)2
d
kp()k2 = 2 . (6)
d (j + )
j=1

On the other hand, we have


n
X (g T g)2
kqk = . (7)
(j + )
j=1

Note that
1 1
2 () ( kp()k )
0 = 3 ,
2 () 12 (kp()k2 ) 2 d
d
kp()k2
by (7), we have
2 
kp()k3 (kp()k )
 
2 () kp()k kpk
0 = = .
2 () kqk2 (kp()k) kqk

Therefore, (4.26) and (4.27) are equivalent. 

9
Problem 7. Consider the problem the problem (4.5) and its approximate solutions:

pC : the Cauchy point;


pD : approximation obtained by the dogleg method;
pT : solution of the two dimensional problem (4.17); and
p : exact solution of (4.5)

Prove that
m(p ) m(pT ) m(pD ) m(pC ).
Proof:

If we exactly solve problem (4.5), the admissible set is S = {p|kpk }. Obviously,


pT , pD , pC S . Since the p is the minimizer of (4.5) over S , then we have
m(p ) m(pT ), m(p ) m(pD ) and m(p ) m(pC ).

If we solve (4.5) in the two-dimensional subspace, the admissible set is S T = {p|kpk


, p span{g, B 1 g}}. Since pC is the minimizer of (4.5) along the path p =

kgk g, 0 < 1 and pD is the minimizer of (4.5) along the path

T
D
ggT Bg
g
g 0< 1
p ={ Tg gT g
ggT Bg g + ( 1)(B 1 g + g T Bg
g) 1 < 2.

Therefore, pD , pC S T . We have m(pT ) m(pD ) and m(pT ) m(pC ).


gT g
Let pU = g T Bg
, which is the minimizer of (4.5) without any constrains. (i) If kpU k >
gT g kgk3
, then we have g T Bg
kgk . Hence g T Bg
1. This means pC = kpU k
pU and
pC is on the path of dogleg. Since pD
is the minimizer along the dogleg path, we
have m(p ) m(p ). (ii) If kp k , it is easy to see that pD = pC = pU . Hence
D C U

m(pD ) = m(pC ).

In a word, we can see that m(p ) m(pT ) m(pD ) m(pC ). 

10
Problem 8. The symmetric matrix B has eigenvalues 1 2 n with corre-
sponding orthonormal eigenvectors qi , i = 1, 2, , n. When q1T g 6= 0 (the hard case in the
textbook), consider
X qT g
i
p( ) = qi + q 1 .
i 1
i6=1

Find a s.t. kp( )k = .


P qiT g
Proof: According to the definition p = i6=1 i 1 qi + q1 . Then we have

X qiT g X (q T g)2
kp( )k2 = k qi + q1 k2 = i
+ 2.
i 1 (i 1 )2
i6=1 i6=1

Since
X (qiT g)2
2 ,
(i 1 )2
i6=1

it is always possible to choose


v
(qiT g)2
u X
= t2
u
(i 1 )2
i6=1

11