You are on page 1of 10

Note Set 7 Nonlinear Equations

7.1 Overview

Often, we will be interested in solving a system of non-linear equations,

f ( x) = 0 . Non-linear equations are generally hard problems.

7.2 Newtons Method

Newtons method proceeds by using the approximation,

f ( x*) f ( x) + f '( x)( x * x)


If x * is a solution, then we have f ( x*) = 0 . Hence, we can solve

f ( x) + f '( x)( x * x) = 0 for x * to obtain,


x* = x f '( x) 1 f ( x)
Now, let J k = f '( xk ) denote the Jacobian at iteration k . Newtons method proceeds by
iterating,

xk +1 = xk J k 1 f k
Notice, in particular, when f k = 0 , the algorithm will suggest a negligible correction.
1

As one might imagine, this method does not work well without modification. A
globally convergent medication sets,

xk +1 = xk J k 1 f k
for 0 < 1 . As with unconstrained optimization, we will start from a full Newton step
and backtrack if the new point does not result in a sufficient improvement. How do we
measure improvements though? The strategy we will take is to use some norm. The most
likely choices and the L-2 and L-1 norms,
n

x 1 = | xi | ,

x 2 = xi 2

i =1

i =1

A natural question to ask is why we did not simply choose to apply unconstrained
minimization from the beginning using the objective function,
g ( x) = f ( x)

It is true that g ( x) will contain a unique minimum where f ( x) = 0 . However, this


function is likely to contain many other local minima. Choosing a direction based on the
original problem generally reduces the likelihood that we will get stuck in a local
minimum. Nonetheless, for problems that are not smooth enough for Newtons method to
be effective, applying Nelder Mead to minimize g ( x) = f ( x)

may be the most

effective strategy.
The algorithm above, of course, requires that the Jacobian be analytically
provided. If it is not available, we can compute it using finite differences,
J k ,i =

f ( xk + hei ) f ( xk )
for i = 1, 2,..., n
h

Here, ei denotes the i th unit vector. Notice, in particular, that we require n + 1 function
evaluations at each iterations. If analytical derivatives available, we only require one
function evaluation per iterations (not including backtracking evaluations).
An alternative strategy is to develop secant methods. Secant methods solve the
secant equation,

Ak +1 ( xk xk 1 ) = f k f k 1
As before, there are many solutions to this system. To pin down the solution, we will
solve the following problem,

Ak +1 =
where B

arg max

A: A ( xk xk 1 ) = f k f k 1

A Ak

denotes the Frobenius norm (which is simply the L2 norm on the vectorized

version of B ). The unique solution to this problem satisfies,

Ak +1 = Ak +

( f k f k 1 ) Ak ( xk xk 1 ))( xk xk 1 ) '
( xk xk 1 ) '( xk xk 1 )

This update is known as Broydens update. Notice, in particular that it is a rank one
update, which means an efficient implementation of Broydens methods will require

O(n 2 ) worth of linear algebra operations per iteration. As before, we need a starting
point. We take a slightly different approach than unconstrained optimization here and
typically start A0 = f '( x0 ) .
While Quasi-Newton methods are almost exclusively used for unconstrained
optimization, Newtons methods is much more commonly used that Broydens method.
In order to implement Newtons method, we must evaluate the function as well as
the Jacobian. For example, suppose that we would like to solve the system,

x2 + ex + y 3 0
f ( x) =
=
log
2
+

xy
x

0
In order to compute the Jacobian, we would first take analytic first derivatives of the
above formula,
2x + ex
f '( x) =
y 1x

1
2 y

In c++, we would supply the following function to the solver,

int Func1(const Vector<double> &x, Vector<double> &f, Vector<double>


&df)
{
if(x(0) == 0.0 || x(1) < 0.0) return 1;
// Error evaluating
// function or
// gradient
f(0) = x(0) * x(0) + exp(x(0)) sqrt(x(1)) 3.0;
f(1) = x(0) * x(1) log(x(0)) 2.0;
df(0,0) = 2.0 * x(0) + exp(x(0));
df(0,1) = 0.5 * sqrt(x(1));
df(1,0) = x(1) 1.0 / x(0);
df(1,1) = x(0);
return 0;
}

Here, we consider a number of special tricks. Consider solving the problem


f ( x) = 0 where [ xfi ( x)] j = 0 for | i + j |> 1 . If this is the case, then we have a Banddiagonal Jacobian matrix. There are two types of efficiency gains that we can make here.
First, evaluating the Jacobian numerically would usually require n + 1 evaluations of the
objective function. Alternatively, notice that,
Ji, j

f j ( x + ei h) f j ( x)
h

Instead, we can compute, f ( x + e1h + e2 h + ... + en h) , f ( x + e2 h + ... + en h) , and f ( x) . We


can then use,
J i ,i

J i ,i +1

fi ( x + e1h + e2 h + ... + en h) fi ( x)
h

f i ( x + e1h + e2 h + ... + en h) f ( x + e2 h + ... + en h)


h

J i , j 0 otherwise
The second type of efficiency gain results from applying linear algebra routines that take
into account the band-diagonal structure of the Jacobian.

7.3 Fixed Point Iterations

Consider the problem f ( x) = 0 . Let us define g ( x) = x + f ( x) . Notice that

f ( x) = 0 if and only if g ( x) = x . This suggests the following algorithm for solving


f ( x) = 0 . We start with some initial guess x0 and we iterate, xk +1 = g ( xk ) until xk +1 and
xk are sufficiently close. This algorithm is called fixed point iterations. It turns out this
algorithm is quite bad for most nonlinear equation problems. It is often a good algorithm
for problems that arise in game-theoretic applications.
We can think of xk +1 as being a best response. Hence, we are iteratively allowing
players to play best responses to each other. If this algorithm were to fail in any particular
game, it would mean that the equilibrium is unstable. Hence, there is good reason to
believe it can be effective in many game theoretic applications. In some game theoretic

applications, this method is orders of magnitude more efficient than Newtons method. It
is, however, less likely to converge.
There are some modifications we can suggest. First, we can try to dampen the
iterations, and use xk +1 = g ( xk ) + (1 ) xk where 0 < < 1 . Selecting small enough
can sometimes make this algorithm work, though it will generally slow it down. A good
strategy is to start at a high value and reduce it if it appears the iterates are diverging or
if cycling is occurring. We can detect cycling if xk +1 and xk are far apart, but xk +1 and
xk 1 , xk 2 , xk 3 , xk 4 are close together. I find that a good strategy is to try fixed point
iterations first on any game-theoretic problem, particularly if the relevant equations are
best response functions.

7.4 Continuation Methods

Some nonlinear equations can be extremely difficult to solve. In fact, nonlinear


equations is generally a harder problem than unconstrained optimization. Continuation
methods (also called Homotopy methods) are good tricks that we can use to solve
otherwise hard to solve nonlinear systems. Consider a set of nonlinear problems indexed
by a parameter . Suppose that we want to solve,
f ( x, ) = 0
for x . Suppose we are interested in solving a hard problem given by f ( x,1) = 0 , but that

f ( x, 0) = 0 can be solved relatively easily. This suggests the following approach. Begin
by solving f ( x, 0) = 0 . Using the solution as a starting point in solving f ( x, ) = 0 . Use

the solution to this as a starting point in solving f ( x, 2) = 0 , etc. Sometimes, this


process can be very effective (at least when f is continuous in ).

7.5 A Discrete Game of Incomplete Information

Consider the following game. In the first stage, players 1 and 2 must decide
whether to arm themselves of whether to agree to a default settlement of ( ,1 ) . In
order to build up their armies, each country must pay a cost of ck which is drawn from
the distribution Fk . In the event that only one country arms itself, they receive the entire
surplus of 1, but must pay a cost of ck . The other country receives a utility of 0. In the
event that both countries fight, country 1 wins the war with probability 0 < < 1 , and the
surplus is discounted at rate < 1 .
This is a slightly more complicated version a model we considered in Note Set 1.
We have the following utility functions,

Fight

Dont

Fight

( c1 , (1 ) c2 )

(1 c1 , 0)

Dont

Country 1

Country 2

(0,1 c2 )

( ,1 )

Let 1 denote country 1s beliefs over the probability that country 2 fights. Then country
1s expected utility form fighting is given by,
U1F = 1 ( c1 ) + (1 1 )(1 c1 ) = 1 c1 1 (1 )
while country ones utility from not fighting is given by,

U1D = 1 *0 + (1 1 ) = 1
Notice that U1F U1D if and only if the following conditions holds,

c1 1 1 (1 )
Country 2s utilities are given by,

U 2F = 2 ( (1 ) c2 ) + (1 2 )(1 c2 ) = 1 c2 2 (1 (1 ))
U 2D = 2 *0 + (1 2 )(1 ) = 1 2 (1 )
Country 2 will fight if an only if,

c2 2 ( (1 ))
We can see that for a given set of beliefs about the other players type, each player will
use a cutpoint strategy,
F , c1 c1 *
,
x1 *(c1 ) =
DF , c1 > c1 *

F , c2 c2 *
x2 *(c2 ) =
DF , c2 > c2 *

Using rational expectations, we can conclude that the following two equations must be
satisfied in equilibrium,

c1* = 1 F2 (c2 *)(1 )


c2 * = F1 (c1*)( (1 ))
We have here a system of nonlinear equations that we can solve using either Newtons
method or fixed point iterations.

Now, let us pick some more interesting values. Suppose that = 23 , = 34 , and

= 12 , and let F1 (c) = F2 (c) = 1 e c for c 0 . In example5.cpp, we solve this problem


using Newtons method, Broydens method, and fixed point iterations. We could that all
three methods converged after a small number of iterations to the point
(0.351657,0.580217). This indicates that the country who has a better status quo
allocation is less likely to prepare for war.

7.6 Another Example

Consider the following model. There is a continuum of households characterized


by their income y , and income distribution Fy . Each household chooses to reside in one
of J communities. The income tax rate in community j is t j , and this tax is used to
provide a public good g j . Each households utility function is given by
u j ( y ) = (1 t j ) y + g j 2 . Let s j be the number of residents in the community. Then a

balanced budget is given by cs j g j = t j ydFy ( y ) .


Cj

We denote the boundaries by B j . Individuals will be indifferent between


communities j and j + 1 if (1 t j ) y + g j 2 = (1 t j +1 ) y + g j +12 , or if y =

so that B j =

g j +12 g j 2
t j +1 t j

. Notice that s j = F ( B j ) F ( B j 1 ) and,

g j +12 g j 2
t j +1 t j

tj
gj =
c( F (

g j +12 g j 2
t j +1 t j

y =

2
j +1

g j 2 g j 12

ydFy ( y )

t j t j 1

g j2

t j +1 t j

) F(

g j +12 g j 2
t j +1 t j

))

Now, notice that all the parameters of the model are known except ( g1 ,..., g J ) .
Hence, we can solve for the equilibrium public good levels using a non-linear solver.
Notice, in addition, that the solver has a band-diagonal format (e.g. the j th equation only
depends on g j 1 , g j , and g j +1 ). This means that we can economize on the computation
of the gradient.

7.7 - Suggested Reading

[1]

Numerical Recipes in C.

[2]

Dennis and Schnabel.

[3]

Ken Judds Book.

10

You might also like