You are on page 1of 16

MT5802 - Calculus of variations

Introduction.

Suppose y(x) is defined on the interval a,b  and so defines a curve on the (x,y ) plane.
 
Now suppose
b

I = ∫ F(y, y ′,x)dx (1)


a

with y ′ the derivative of y(x) . The value of this will depend on the choice of the
function y and the basic problem of the calculus of variations is to find the form of the
function which makes the value of the integral a minimum or maximum (most commonly
a minimum).
The sort of question which gives rise to this kind of problem is exemplified by the
“Brachistochrone” problem, solved by Newton and the Bernoullis (the name comes from
the Greek for “shortest time”). This considers a particle sliding down a smooth curve
under the action of gravity and poses the question as to what curve minimises the time for
the particle to slide between fixed points A and B.

Clearly the time will need to be found by calculating the speed at each point then
integrating along the curve.
Other examples arise in various areas of physics in which the basic laws can be stated in
terms of variational principles. For example in optics Fresnel’s principle says that the
path of a light ray between two points is such as to minimise the time of travel between
the two points.

The Euler-Lagrange equation.

First recall the condition under which an ordinary function y(x) has an extremum. If we
expand in a Taylor series
1
y(x + δx) = y(x) + δxy ′(x) + δx 2y ′′(x) + ........
2
then the condition is that the term proportional to δx must vanish, so that if the second
derivative is non-zero the difference between y(x + δx) and y(x) will always have the
same sign for small δx . The same principle applies to our present problem.
What we do is consider a small change in the function y(x) , replacing it with
y(x) + η(x) . (Note that all the functions we introduce are assumed to have appropriate
properties of differentiability etc, without particular comment being made.) We then
produce a change in the integral, which can be expanded in powers of η . We demand
that the term proportional to η vanishes.
Substituting into (1) we get
b

I(y + η) = ∫ F(y + η, y ′ + η′,x)dx


a
b b
 ∂F ∂F 
= ∫ F(y, y ,x)dx + ∫ 
′ η+ η ′ dx +O(η 2 )
a a 
∂y ∂y ′ 
so that what we want is
b
 ∂F ∂F 
∫  ∂y η + ∂y ′ η′ dx = 0 . (2)
a  
Integrating the second term by parts gives
b 
 
 ∂F − d  ∂F  η(x)dx = 0
∫  ∂y dx  ∂y ′  (3)
a  
In obtaining this we have assumed that η(a) = η(b) = 0 , ie the perturbation vanishes at
the end points, leaving the end points A and B of the curve unchanged, as shown below.

x
The unperturbed curve (full line) and the perturbed curve ( dotted line)

Since this must hold for all η(x) we obtain


∂F d  ∂F 
−   = 0 . (4)
∂y dx  ∂y ′ 
This is the Euler-Lagrange equation, the basic equation of this theory. It is a differential
equation which determines y as a function of x .

Examples.

(a) Find the curve which gives the shortest distance between two points on a plane.

If the curve is y = y(x) then the element of length is

dl = dx 2 + dy 2 = 1 + y ′2dx
so we want to minimise
b

∫ 1 + y ′2dx
a
(where a and b are the x-coordinates of the points of interest).
The integrand is independent of y so we just get
d  ∂ 
 1 + y ′2  = 0
dx  ∂y ′ 
y′
giving = const , or y ′ = const . As expected this just gives a straight line
1 + y ′2
y = mx + c with the constants fixed by the positions of the end points.

(b) Find the curve which minimises


b

∫ (y + y ′2 )dx
2

The Euler-Lagrange equation for this is


d
(2y ′) − 2y = 2y ′′ − 2y = 0
dx
and if we multiply by y ′ we get a first integral y ′2 − y 2 = const. Assuming this constant
to be positive and equal to a 2 we get the solution y = a sinh(x + b) . If the constant is
negative we can take it to be −a 2 and get the solution y = a cosh(x + b) . In both cases b
is a constant and a and b need to be found using given end points.

This is a fairly simple, artificial example, but it illustrates a more general point. Note that
we could easily find a first integral and reduce the problem to a first order DE. The
existence of a first integral like this turns out to be a general property of the Euler-
Lagrange equation whenever the integral has no explicit dependence on x .
Under these circumstances, if we multiply the E-L equation by y ′ we get
d  ∂F   ∂F 
y ′   − y ′  
 ∂y  = 0
dx  ∂y ′ 
or
d  ∂F   ∂F   ∂F 
y ′  − y ′′   − y ′  
dx  ∂y ′   ∂y ′   ∂y  = 0 .

dF
Since F does not contain x explicitly, the last two terms combine to give , the total
dx
derivative. So, we get the first integral
 ∂F 
y ′   − F = const (5)
 ∂y ′ 

As a more interesting example we return to the brachistochrone problem mentioned


earlier.
Suppose two points A and B are connected by a smooth ramp along which a particle can
slide, starting at rest at A. Taking A at the origin and the y direction vertically
downwards, then at a point (x,y) on the curve, the particle speed is given by
v 2 = 2gy
(with g the acceleration due to gravity). The time to move an increment (dx,dy) along
the curve is
dx
dt = dx 2 + dy 2 / v = 1 + y ′2
2gy
So, the integral which we need to minimise is
b
1 + y ′2
∫ y dx
a
and the first integral of the Euler-Lagrange equation as derived above (Eq. (5)) is
y′ 1 + y ′2
y′ − =c .
y(1 + y ′2 ) y
This simplifies to (with k = −1 / c )
y(1 + y ′2 = k
y(1 + y ′2 ) = k
or
k2 −y
y′ = .
y
This can be integrated by making the substitution y = k 2 sin 2 θ , giving
dy dθ cos θ
= 2k 2 sin θ cos θ =
dx dx sin θ
which has the solution
k2
x=
2
(2θ − sin 2θ) + K .
Putting b = k 2 / 2 and φ = 2θ we get parametric equations for the curve in the form
x = b(φ − sin φ) + K
y = b(1 − cos φ) .
As illustrated by the diagram below, these represent a cycloid, the curve traced out by a
point on the circumference of a wheel of radius b rolling along the x axis.

A
x

Since the curve passes through the origin, K = 0 . The value of b is determined by the
condition that the curve passes through B.

More than one dependent variable.

Suppose F = F(y1, y1′,y 2, y 2′,y 3, y 3′,..........) with each yi = yi (x) and again we are looking
b

for an extremum of ∫ Fdx . The analysis proceeds as before, replacing each yi with
a

yi + ηi . Since each ηi can be chosen independently, we must let the coefficient of each in
the integrand vanish. We end up with a system of Euler-Lagrange equations
∂F d  ∂F 
=  . (6)
∂yi dx  ∂yi′ 
It has, of course, been assumed that the end points are fixed, as before.
Example: Find the curve which minimises
1

∫ (y ′ +z ′2 + y 2 )dx
2

0
and which joins the points (0, 0, 0) and (1,1,1) .

The E-L equations are


d
(2y ′) − 2y = 0
dx
d
(2z ′) = 0
dx
with general solutions
y = a cosh x + b sinh x
z = cx + d.
Imposing the end point conditions gives the curve
y = cosh−1 1cosh x
z =x .

Hamilton’s Principle

Suppose a conservative dynamical system is described by coordinates (q1,q 2,.....,qn ) and


the rates of change of these are q&i (i = 1,....,n ) . Then the kinetic energy of the system is,
in general T(q1,...,qn ; q&1...., q&n ) and the potential energy is V(q1,...,qn ) . The Lagrangian is
then defined by L = T −V and Hamilton’s principle states that along the particle orbit
the integral
t2

∫ Ldt
t1

has an extremum. This gives rise to the set of equations of motion


d  ∂L  ∂L
  − = 0,
dt  ∂q&  ∂q
usually known in this context as Lagrange’s equations. These can be derived from
Newton’s laws of motion and then Hamilton’s principle becomes a deduction from them.
For complicated systems Lagrange’s equations are usually easier to handle than any
attempt to work out the equations of motion directly from Newton’s equations.
Example: Find the equations of motion for the double pendulum system shown below.

m
φ

The height of the top bob above its equilibrium position is a(1 − cos θ) and the height of
the lower bob above its equilibrium is a(1 − cos θ) + b(1 − cos φ) . So
V = mga(1 − cos θ) + Mg a(1 − cos θ) + b(1 − cos φ)
 
&
The horizontal component of velocity of the top bob is a θ cos θ and the vertical
component a θ& sin θ . For the lower bob the corresponding components are
a θ& cos θ + bφ& cos φ and a θ& sin θ + bφ& sin φ and so
1 1
T = ma 2θ& 2 + M (a θ& cos θ + bφ& cos φ)2 + (a θ& sin θ + bφ& sin φ)2 
2 2  
1 1
= ma 2θ& 2 + M a 2θ& 2 + b 2φ& 2 + 2abθ&φ& sin(θ + φ)
2 2  
From Lagrange’s equations we then get
d
dt
( )
ma 2θ& + Ma 2θ& + Mabφ& sin(θ + φ) + mga sin θ + Mga sin θ = 0

d
dt
( )
Mb 2φ& + Mabθ& sin(θ + φ) + Mgb sin φ = 0 .

Problems with constraints.

Recall that to find the extremum of a function of several variables with constraints
imposed we use Lagrange’s method of undetermined multipliers. An exact analogy holds
in the case of calculus of variations. Suppose we want to find the extremum of
b

I = ∫ F(y, y ′,x)dx
a
subject to the condition
b

H = ∫ G(y, y ′,x)dx = const.


a
Then we apply the Euler Lagrange equations to F − λG (or F + λG if you prefer) with
λ an undetermined multiplier which is determined by the constraint and the end points.

Example; A heavy chain with constant mass/unit length is suspended between two points.
What curve does it take up in equilibrium?

The equilibrium condition is such as to minimise gravitational potential energy.

y
b

With the geometry shown this means that we minimise ∫ ydl where the element of
a

length is given by dl = 1 + y ′2dx . There is also the constraint that the total length is
fixed, so we must minimise
b

∫y 1 + y ′2dx
a
subject to
b

∫ 1 + y ′2dx = const.
a

Thus we apply the Euler-Lagrange equation to (y − λ) 1 + y ′2 . Since this has no


explicit dependence on x we can use the result we obtained already (Eq. (5)) to get a first
integral, namely,
d  2
y′ (y − λ) 1 + y ′  − (y − λ) 1 + y ′ = k
2

dy ′  
from which we get
2
 y − λ 

y ′ = 
2
 −1 .
 k 
If we make the substitution y − λ = k cosh z this gives z ′ = 1 so that z = x + c and we
obtain
y = λ + k cosh(x + c) .
The three constants k, c and λ are obtained from the coordinates of the end points and the
length. This curve is called a catenary.

If there is more than one constraint then we introduce more than one multiplier.
Example: In statistical mechanics, the distribution of energy of a system of particles is
described by a probability distribution function f (E) . In equilibrium, theory says that
this distribution should be such as to maximise the function

−∫ f log fdE
0
subject to the conditions
∞ ∞

∫ f (E)dE = 1 ∫ Ef (E)dE = E 0
.
0 0
The first of these is just the standard condition on a probability function. The second,
with E0 a given constant, says that the average energy per particle, or equivalently the
total energy of the system, is fixed.
The Euler-Lagrange equation with these constraints is


(−f log f − λf − µEf ) = 0
∂f
with λ and µ the two multipliers corresponding to the two constraints. This yields
f = Ce −µE
where C is a constant into which λ has been incorporated. Using the two constraints
gives µ = 1 / E 0 , C = E 0 . This is the Boltzmann distribution and E0 is proportional to
the temperature of the system.

The isoperimetric problem - find the shape which has maximum area for a given
perimeter.

Suppose the parametric equations of the required curve are


x = x(t) y = y(t) t0 ≤ t ≤ t1
with x(t0 ) = x(t1 ) , y(t0 ) = y(t1 ) , so that we have a closed curve. The length is
t1

∫ (x&
2
L= + y& 2 )1/2dt
t0

and is fixed. The area enclosed is


A= ∫∫ dxdy
A
which, by means of Green’s theorem, can be expressed as the line integral
t1
1
A = ∫ (xy& − yx)dt
& .
2t
0

So, we construct the Euler-Lagrange equations for the two variables x and y from the
function
1
φ(x,y, x,
& y) & = (xy& − yx) & − λ(x& 2 + y& 2 )1/2 .
2
These equations are
 
d  y λx&  y&
− − − = 0
1/2   2
dt  2

2
(
x& + y& 2
)
 
d  x λy&  x&
 −  − = 0.
1/2   2
dt  2
 ( 2
x& + y& 2
) 
These can be integrated immediately to give
λx&
y+ 2 =A
(x& + y& 2 )
λy&
x− 2 =B
(x& + y& 2 )
Multiplying the first of these by x& and the second by y& and adding gives
x(x
& − B) + y(y& − A) = 0
This has the integral
(x − B ) + (y − A)
2 2
= const.
so that the required curve is a circle.

Geodesics

If G(x,y,z) = 0 defines a surface in three dimensional space, then the geodesics on this
surface are the curves which produce the shortest distance between points on the surface.
So, as we have seen, the geodesics on a plane are just straight lines. We can cast the
problem of finding geodesics on a surface into a variational problem with a constraint as
follows. If x = x(t),y = y(t),z = z(t) are parametric equations for a curve on the
surface, then along any curve on the surface
t1

∫ G (x(t),y(t),z(t))dt = 0 (8)
t0
Since the element of length along a curve is x& 2 + y& 2 + z& 2dt , the problem is to minimise
t1

∫ x& 2 + y& 2 + z& 2dt


t

subject to the constraint (8). This gives


d  x&  ∂G
  − λ =0
dt  F  ∂x

plus similar equations with for y and z , with F = x& 2 + y& 2 + z& 2 .

As a particular example consider geodesics on the sphere, for which


G = x 2 + y 2 + z 2 − R2
and so the equations for the geodesic are
d  x&  d  y&  d  z& 
     
dt  F  dt  F  dt  F 
= = =λ.
2x 2y 2z
Expanding the derivatives in the first equation gives
&& − x&F&
xF && − y&F&
yF
=
2xF 2 2yF 2
which can be rearranged into
x − xy&& F&
y&&
= .
yx& − xy& F
In a similar way,
zy&& − y&&
z F&
= .
zy& − yz& F
We now equate these two expressions for F& / F and write the result in the form
d d
(yx& − xy)
& (zy& − yz)
&
dt = dt
yx& − xy& zy& − yz&
which integrates to give
yx& − xy& = C 1(zy& − yz)
& .
Writing this in the form
x& +C 1z& y&
=
x +C 1Z y
we can integrate again to get
x +C 1z = C 2y .
This is the equation of a plane passing though the origin. So, the geodesics on a sphere
are the curves formed by the intersection of the sphere and planes through its centre.
These are the great circles on the sphere.

Estimate of an eigenvalue using a variational method.

Suppose we have a problem


d
(p(x)y ′) + q(x)y = λr(x)y y(0)=y(1)=0 . (7)
dx
This will obviously have a trivial solution y = 0 , but for certain values of λ (the
eigenvalues ) there will be non-trivial solutions. If we consider the calculus of variations
problem of minimising
1

∫ {p(x)y ′ }
2
I = −q(x)y 2 dx
0
subject to the condition that
1

∫ r(x)y
2
J = = const.
0

and the given boundary conditions on y , then we obtain the above equation from the
Euler-Lagrange equations and the method of multipliers. The lowest possible eigenvalue
is then the minimum possible value of I/J. Since J is constrained to be constant this is just
equivalent to the problem of minimising I subject to J being constant. The standard
approach to this leads back to the DE and we may appear to be going round in circles.
The usefulness of this approach is that if we use any function y(x) then the resulting
value of λ is greater or equal to the minimum possible, so we obtain an upper bound on
the lowest eigenvalue. With a choice of y which is a reasonable approximation to the
solution we can get a good estimate.

Example: Use this technique to find an estimate of the lowest eigenvalue of the problem
y ′′ + λy = 0 y(0) = y(1) = 0

This is, of course a problem to which we know the solution, namely that the eigenvalues
are given by λ = n 2π 2 , so the lowest is π 2 = 9.8696 (corresponding to the solution
y = sin(πx) ). We want a trial function with the required end values, and preferably one
which is easily integrated (though numerical integration is readily done with a package
like MAPLE). Let us take y = x(1 − x) . Then p(x) = 1 q(x) = 0 r(x) = 1 and so
1
1
∫ (1 − 2x) dx = 3
2
I(x) =
0
1
c
1
J = ∫ x 2(1 − x)2dx =
0
30
giving an upper bound of 10, which is actually a fairly good approximation to the lowest
eigenvalue.

Variants on the boundary conditions are possible, for example in the following.

Example: Find the lowest eigenvalue for the problem


d
(xy ′) + λxy = 0
dx
with y ′(0) = y(1) = 0 .

A simple function satisfying the boundary conditions is y = 1 − x 2 . For which


1
1
∫ x(−2x )
2
I = =
0
2
1 1
J = ∫0
x(1 − x 2 )2dx =
12
giving an upper bound of 6. One way of making this procedure more accurate is to
introduce one or more unknown parameters into the assumed function. For, example
here we could take y = (1 − x 2 )(1 + cx 2 ) , which retains the correct boundary conditions.
Then
16 1
1 + c + c2
I 15 2
= .
J 1 16 1 2
+ c+ c
6 105 24
The point of the exercise is that this gives an upper bound for any value of c. So, if we
minimise this expression with respect to c we will get the best possible estimate for this
form of y . Differentiating with respect to c we get the condition for a minimum that
40c 2 + 105c + 32 = 0 , and the root which gives a minimum is c = −0.352 . The
corresponding value of I/J is 5.808. The equation has a solution which is a Bessel
function and from this the eigenvalue can be calculated to be 5.783.
The technique of introducing unknown parameters (the Rayleigh-Ritz method) and
minimising with respect to them is a very useful technique when the minimum value of
an integral is needed. Note that even in this simple example the algebra is tedious, so that
use of a computer algebra system is a big help.
In many problems (eg finding the ground state energy in a quantum system) the lowest
eigenvalue is all that is needed. It is possible to extend this technique to get higher
eigenvalues, but we shall not pursue this here.
Variable end points

Suppose we relax the condition that the values of y be fixed at the end points but instead
assume that they are allowed to vary freely. Then, following the procedure which led to
Eq. (3) we obtain, as well as the term in (3), an extra contribution
 ∂F   
 η  −  ∂F η  .
 ∂y ′   ∂y ′ 
  x =b   x =a

Since, the extremum, if it exists, must be the extreme value for whatever end points turn
out to be suitable, the integral term must vanish as before. Otherwise a slightly greater or
smaller value for the integral could be obtained by taking a different curve with the same
end points. Also, this extra term must vanish which, since η is arbitrary, means that
∂F
=0
∂y ′
at both end points.
As a simple example we can consider the problem of minimising the distance between
x = 0 and x = 1 without fixing y at the end points. Here
F = 1 + y ′2
and the solution of the E-L equation is a straight line as before. The extra conditions
yield
y′
=0
1+y ′ 2

at both ends. The derivative must then be zero everywhere, so we arrive at the expected
result that the shortest line between x = 0 and x = 1 is a straight line parallel to the x
axis.
Another variant of the problem is to consider a case where the end points are constrained
to lie on a given curve. For simplicity, let us assume that the lower end point is fixed
while that the upper end point has to lie on the curve y = g(x) . Suppose that the
extremum has its upper limit at x = b , while the lower limit x = a is fixed. Then, if y is
replaced with y + η as before, there is a change in the upper limit of the integral to
b + ∆x , say.
y+η

y
y(x)

y=g(x)

a x b b+∆x

The corresponding change in y is


∆y = y(b + ∆x) + η(b + ∆x) − y(b) ≈ ∆xy ′(b) + η(b) .
However, there is also the constraint that the end point lies on the given curve, which
gives ∆y ≈ g ′(b)∆x . Putting these relations together we get
η(b)
∆x = .
g ′(b) − y ′(b)
Now, look at the change in the integral -
b+∆x b

∆I = ∫ F(y + η, y ′ + η ′,x)dx − ∫ F(y, y ′,x)dx


a a
b+∆x
 ∂F ∂F 
b+∆x
η
≈ ∫  ∂y + η ′ dx +
∂y ′  ∫ F(y, y ′,x)dx.
a b
In the first integral here, which contains the small perturbation η , we can neglect the
change in the upper limit. Then if we integrate by parts we get the usual integral
containing the E-L expression, plus a contribution
 ∂F 
 
 ∂y ′ η  .
  x =b
The second integral is approximately
η(b)
F(y, y ′,b)∆x = F(y, y ′,b) .
g ′(b) − y ′(b)
For ∆I to vanish for arbitrary η we require that the E-L equation be satisfied and also
that
∂F F
+ =0 (9)
∂y ′ g ′ − y ′
at the upper limit. If the lower limit had been constrained also to lie on a curve rather
than being fixed, a condition analogous to (9) would apply there also.

Example: Find the curve connecting the origin to a curve y = g(x) and which has the
shortest length.

Here, as we have seen before, F = 1 + y ′2 and the solution of the E-L equation gives a
straight line. For this case, the condition (9), which must be satisfied where the straight
line meets the given curve, reduces to g ′y ′ = −1 , implying that the straight line giving
the shortest distance to the curve must be orthogonal to the curve at the point of
intersection.

Some further comments

In the case of the simple problem of finding a local maximum or minimum of a


differentiable function we know that vanishing of the derivative is a necessary condition,
but that it is not sufficient. The derivative can vanish but the point can be a point of
inflection rather than a turning point. For calculus of variation problems the situation is
similar, in that the Euler-Lagrange equation is a necessary, but not sufficient, condition
for an extremum. In the case of a function, the nature of the critical point is easily
determined by identifying the lowest non-zero derivative. The analysis of the calculus of
variations problem is, however, rather complicated and will not be pursued here.

The basic ideas discussed here can be extended in various ways, for example to
integrands which involve higher derivatives of y or to problems which involve
minimising a multiple integral over some given domain.

Further Reading
Calculus of Variations R Weinstock
Variational Calculus in Science and Engineering M J Forray
An Introduction to the Calculus of Variations L A Pars

You might also like