You are on page 1of 12

Sturm-Liouville boundary value problems: two different

approaches

Jorge Arce Garro

December 11, 2018

Abstract

Sturm-Liouville boundary value problems are important in several branches of PDEs,


physics and engineering. Also, they are theoretically interesting in their own right. We
will carry out an exploration of these BVPs, using 2 different frameworks.
First, we shall refer to how the general methods of functional analysis can be applied to
the operator L and yield, with elegance, several results on the nature of the eigenvalues of
the solutions. Second, we shall use a direct, parametric approach to study u and p(x)u0 .
Much can be said of these problems via elementary arguments, by using a particular set
of polar coordinates called the Prüfer variables.

1 Introduction
We start with the definition of a Sturm-Liouville boundary value problem (S-L BVP). It
is the second-order differential equation given by
   
1 d d
Lu(x) = λu(x), L= − p(x) + q(x) (1)
r(x) dx dx
subject to the following boundary conditions:

BCa (u) := cos(α)u(a) − sin(α)p(a)u0 (a) = 0 (2)


0
BCb (u) := cos(β)u(b) − sin(β)p(b)u (b) = 0
for a < b, α, β ∈ R.
Remark: The angles α and β are here to give us versatility to describe many homogeneous
boundary conditions in a single equation. For example:

1
• For an angle equal to 0 we have a boundary condition that only depends on u
(Dirichlet boundary condition).
• For an angle equal to π/2 we have a boundary condition that only depends on u0
(Neumann boundary condition).
• For non-equivalent cases mod π, we have a mixed boundary condition (Robin
boundary condition).
Note that (1) is written as an eigenvalue problem for the linear operator L. Indeed, as
part of solving the problem, we will have to determine the values of λ that allow solutions.
We shall call these eigenvalues of (1), and the associated solutions u shall be called
eigenfunctions of (1)
These problems are important in several areas of physics and engineering. Perhaps the
most widespread place in which they appear is when the method of separation of variables
is applied to solve partial differential equations subject to boundary conditions. When
successful, this method splits a PDE into several ODEs, and for many of these PDEs (e.g.
heat equation, wave equation, Laplace’s equation) we end up with problems of the form
of (1), at least when the geometry of the boundary is simple enough.

Definition 1.1. The Sturm-Liouville boundary value problem is said to be regular if


p(x), w(x) > 0, and p(x), p0 (x), q(x), and w(x) are continuous over the finite interval
[a, b].

We shall only consider regular Sturm-Liouville BVPs here.

Example:One of the simplest examples of a S-L BVP is given by taking [a, b] = [0, 2π],
p = r = 1, q = 0, α = β = 0. We are left with:

d2
− u = λu
dx2
subject to u(0) = u(π) = 0.
Let us restrict our attention to real solutions (therefore, to real eigenvalues as well).
This is a second order, homogeneous ODE of constant coefficients. Its solutions depend
on the sign of λ :
• For λ > 0, solutions are sinusoidal
• For λ = 0, solutions are linear
• For λ < 0, solutions are exponential
It is easy to check that the only way to satisfy the
√ boundary conditions is with sinusoidal
functions. Therefore, we have λ > 0, and setting λ = ω we get:

2
u(x) = C cos(ωx) + D sin(ωx)
Plugging in the boundary conditions, we see:

0 = C cos 0 + D sin 0 ⇒ C = 0
and, assuming that u is not identically zero,

0 = D sin πω ⇒ ω = n ∈ N

Let us discard the trivial eigenfunction sin(0x). The eigenfunctions are of the form
un = D sin (n + 1)x and the eigenvalues are (n + 1)2 : n ∈ N .


The example above will serve to illustrate several important properties of S-L problems
throughout our exploration.

2 A functional analysis approach


2.1 Compact, symmetric operators
One of the first things we would like to gain insight on is the distribution of the eigenvalues
for (1). For our particular example above, we have a sequence of countable eigenvalues
which tends to infinity. Is this always the case?
Functional analysis provides us with a whole family of operators for which much can
be said about the eigenvalues:

Definition 2.1. Let X be a Banach space (in our context, this will be a space of functions).
A linear operator A on X is said to be compact if for any bounded sequence fn there is a
subsequence fnk such that A (fnk ) is convergent.

The following theorem is a standard result, found in any functional analysis textbook.

Theorem 2.1. (Spectral theorem for compact, symmetric operators): Let H be a Hilbert
space with inner product h , i and let A be a bounded, compact operator on H. Assume,
furthermore, that A is symmetric, that is:

∀f, g ∈ H (hg, Af i = hAg, f i)


Then there exists a sequence of real eigenvalues αj converging to 0. The corresponding
normalized eigenvectors uj form an orthonormal set and every f ∈ Range(A) can be written
as a finite linear combination of the uj ’s:

3
N
X
f= huj , f i uj
j=0

(Note this implies that if Range(A) is dense, then the eigenvectors form an orthonormal
basis.)

It wouldn’t make much sense to try to apply this theorem to the operator L right
away, given that we saw that there is at least one case (the example above) in which the
sequence of eigenvalues tends to infinity, rather than zero!
However, precisely for this reason, one might be inclined to look at the inverse of the
operator L, which would have eigenvalues equal to the reciprocal of these, which would in
turn tend to zero. This shall be our strategy.

2.2 L is symmetric in an appropriate subspace


The symmetry of an operator is a property preserved even when taking inverses, so writing
down an appropriate Hilbert space for our solutions in which L is symmetric is valuable
work.
Define H as the space of continuous functions (can be taken complex-valued) in [a, b],
equipped with the following inner product:
Z b
hf, gi = f (x)g(x)r(x)dx
a
That is, r shall play the role of a density function in our inner product.

Next, we would like to define L as an operator on H. Unfortunately, there are functions


on H which are not even differentiable, and we need to differentiate twice. Furthermore,
we are only going to consider as inputs functions which satisfy the boundary conditions
(2). This means we will need to define the following subspace:

H0 = f ∈ C 2 [a, b]: BCa (f ) = BCb (f ) = 0




and with it, we are able to prove:

Lemma 2.1. The operator L : H0 → H0 is symmetric.

Proof. We sketch the proof, omitting computational details. By using integration by parts
twice, it is easy to show the Lagrange identity:

hg, Lf i = Wa (g, f ) − Wd (g, f ) + hLg, f i (3)

4
where Wx (f, g) denotes the Wronskian at x of the functions f and g, modified by multi-
plying by p(x):

Wx (f, g) = p(x)(f (x)g 0 (x) − g(x)f 0 (x))


It can be seen that under the conditions BCa = BCb = 0 for f and g the Wronskians
above banish. This leaves us with hg, Lf i = hLg, f i, i.e., L is symmetric!

2.3 Showing compactness for (L − zI)−1


Now, although we want to work with the inverse of L, we might run across the situation
of L not being invertible. However, we can always find a fixed z ∈ C such that z is not
an eigenvalue of L, and so L − zI is an invertible operator. This transformation will only
shift the eigenvalues of L by z, and so we will still be able to study them.
By doing this, we are able to prove:
Theorem 2.2. The operator (L − zI) : H0 → H0 is a bounded, compact operator.
Although the proof uses interesting techniques, is not specially illuminating and is very
heavy computationally, so we will just sketch the main ideas below:
i. Write (1) as a first-order system of equations by using a variant of the classic sub-
stitution used to this end:

w = y 0 (p(x)
ii. To find the inverse of L − zI, we want to solve the non-homogeneous system (L −
zI)f = g, for f and g ∈ H0 Use variation of parameters on the system above to do
so, noting that the homogeneous diff. eq. associated to (L − zI)f = g is precisely
(1) with eigenvalue z.
iii. Let us denote RL (z) = (L − zI)−1 . The solution we just obtained can be written in
the form:
Z b
f (x) = RL (z)g(x) = G(z, x, t)g(t)r(t)dt
a
where G denotes the Green’s function:
(
1 ub (z, x)ua (z, t), x ≥ t
G(z, x, t) =
W (ub (z), ua (z)) ub (z, t)ua (z, x), x ≤ t
and ua (z, ·) and ub (z, ·) denote any 2 solutions of the homogeneous system (L−zI)f =
0 adapted to the boundary conditions.

5
iv. We now have RL (z) in the form of an integral operator (on a bounded set) against
the measure induced by the density function r, where its kernel G is continuous. The
methods of functional analysis guarantee that any operator of this form is bounded
and compact. (To do so, a bounded sequence fn on H0 is produced and it is shown
that RL (z)fn is a equicontinuous sequence of functions. Then the Arzelà–Ascoli
theorem is applied to guarantee the existence of a convergent subsequence.)

2.4 Consequences for L


We conclude this section with the following result:

Theorem 2.3. The regular Sturm-Liouville problem has a countable number of discrete,
simple eigenvalues En which accumulate only at ∞. The corresponding normalized eigen-
functions un can be chosen real-valued and form an orthonormal basis for H0 .

To do this, we pick a λ ∈ R such that RL (λ) exists. It is elementary to check that


L is symmetric if and only if (L − λI)−1 is symmetric. Thus, we can apply the spectral
theorem to RL to find it has a countable number of eigenvalues αn → 0. The corresponding
eigenvalues for L are of the form En = λ + α1n , which are discrete and diverge to ∞.
The fact that the eigenvalues are simple can be checked by noting that if un and vn are
2 l.i. functions associated to the eigenvalue En , then the Wronskian W (un , vn ) banishes,
implying these functions are linearly dependent.

With this we finish our use of functional analysis to study (1). We have shed consid-
erable light on the nature of its eigenvalues. Now we turn our attention to the behavior
of its eigenfunctions, by virtue of our next tool.

3 A direct approach: Prüfer variables


3.1 Definition and equivalent system
Definition 3.1. : Let u(x) be a solution of (1) subject to (2). The Prüfer variables
ρu , θu associated to u are defined by applying polar coordinates to (p(x)u0 (x), u(x)). Specif-
ically:

u(x) = ρu (x) sin(θu (x)), p(x)u0 (x) = ρu (x) cos(θu (x)) (4)

It is natural to consider p(x)u0 (x) instead of just u0 (x) if we notice the form of the
operator L in (1), which contains the derivative of p(x)u0 (x).
For these coordinates to be uniquely defined, we shall assume WLOG that ρu never
vanishes (note that if ρu (x0 ) = 0 we would have u(x0 ) = u0 (x0 ) = 0 and by uniqueness

6
we would be talking of the identically zero function). More importantly, the range of θu
is not necessarily [0, 2π[: it will be assumed large enough to guarantee θu is a continuous
function (and doesn’t jump from 2π back to 0) as the function (p(x)u0 (x), u(x)) winds
about the origin.
Let us apply the change of variables to (1). For clarity, we shall omit the variable x in
our functions. We get:
 
1 d 0 d
pu0 = −(λr − q)u
 
− pu + qu = λu ⇒ (5)
r dx dx
Recall that in general polar coordinates for (X, Y ) we have the formulas:

XX 0 + Y Y 0 Y 0X − X 0Y
ρ0 = , θ0 =
ρ ρ2
In our case X = pu0 , Y = u, and with help of (5) above we get:

X 0 = −(λr − q)Y Y 0 = X/p


and so:

−X(λr − q)Y + Y X/p


ρ0u =
ρu
X 2 /p − −(λr − q)Y 2
θu0 =
ρ2u

which after some simple algebra reduces to the system:

 
1 sin(2θu )
ρ0u = ρu + q − λr (6)
p 2
2
cos (θu )
θu0 = + (λr − q) sin2 (θu ) (7)
p
As it is usual with this kind of substitutions, we have exchanged a second-order ODE
by a 2x2 system of first-order ODEs. However, a remarkable feature of this new system
is that the equation for θu0 is independent of ρ, and so we have a first-order ODE which
describes the Prüfer angle! Also note that the equation for ρ0u is separable once θu is known.

7
3.2 Oscillatory nature of the eigenfunctions
The study of Prüfer angles alone is enough for us to deduce interesting properties of the
eigenfunctions of (1). Let us start by studying the sign of θu0 in (7).
We know that p1 > 0. Also, since r and q are continuous on [a, b] they are bounded,
which implies that for a sufficiently large λ (which we know we can find, thanks to the
results in section 2) we have λr − q > 0. In fact, due to the continuity we can say both
of these coefficients are bounded away from zero, and so from the shape of (7) we see
there is a positive constant K such that θu0 > K. As such, from (4) we see any u with
an eigenvalue large enough must have an oscillatory behavior, since the polar angle has a
minimum, positive rate of increase.
Furthermore, θu0 will be larger at some points the bigger λ is. This indicates that these
eigenfunctions will oscillate faster as λ increases!
Now, if the eigenvalue λ is not large enough, we cannot guarantee that θu is increasing.
But we can note the following: at a zero of u, the Prüfer angle is always increasing, as we
can note from (7) and from u(x) = ρu (x) sin(θu (x))
1
u(x0 ) = 0 ⇔ θu (x0 ) = 0 mod π ⇒ θu0 (x0 ) = >0
p(x0 )
This leads us to:

Lemma 3.1. : Regardless of the magnitude of λ, a Prüfer angle can cross integer multiples
of π only from below. This means that, between consecutive zeros, the Prüfer angle must
increase by exactly π units.

3.3 Sturm’s comparison theorem and layout of the zeros of


the eigenfunctions.
These seemingly innocent observations about the Prüfer angles can be used to prove the
celebrated Sturm’s comparison theorem. For this, we first need the following:

Lemma 3.2. (Monotonicity of θu with respect to the coefficients of (7)):


Consider 2 different Sturm-Liouville operators Lj for j = 0, 1 with their respective
pj , qj and rj . Let uj be solutions for Lj uj = λj uj . Call the respective Prüfer angles θj .
Assume that 1/p1 ≥ 1/p0 and λ1 r1 − q1 ≥ λ0 r0 − q0 . Then:
i. If θ1 (c) ≥ θ0 (c) for some c ∈ (a, b), then θ1 (x) ≥ θ0 (x) for all x ∈ [c, b). If the
inequality becomes strict at some x ∈ [c, b), it remains so.
ii. Moreover, if the angles θ1 (x) and θ2 (x) coincide in 2 points, say x = c and x = d
(with c < d) then p1 = p0 and λ0 r0 − q0 = λ1 r1 − q1 on [c, d]

8
Proof. i. It is known that if θ1 (c) ≥ θ0 (c) and θ10 (x) ≥ θ00 (x) for all x ∈ [c, b), then
θ1 (x) ≥ θ00 (x) is also true in [c, b), with the inequality staying strict after any time
0

at which it becomes so.


One can prove, by elementary means, that this result holds changing the condition
"θ10 (x) > θ00 (x)" by

θ10 (x) − f (x, θ1 (x)) > θ00 (x) − f (x, θ1 (x))

for any fixed function f (x, y) which is locally Lipschitz continuous with respect to y,
uniformly in x.
2
In this case, pick f (x, y) = cos (y) 2
p0 (x) + (λ0 r0 (x) − q0 (x)) sin (y). It is easy to check the
required uniform Lipschitz condition from the fact that p01(x) and λ0 r0 (x) − q0 (x) are
bounded on [a, b].
With this f , θ00 (x) − f (x, θ0 (x)) = 0, whereas and θ10 (x) − f (x, θ1 (x)) is given by:

cos2 (θ1 ) cos2 (θ1 )


+ (λ1 r1 − q1 ) sin2 (θ1 ) − + (λ0 r0 − q0 ) sin2 (θ1 )
p1 (x) p0 (x)
 
2 1 1
= cos (θ1 ) − + sin2 (θ1 ) ((λ1 r1 − q1 ) − (λ0 r0 − q0 )) > 0
p1 (x) p0 (x)
So, using the result we stated above, we have i).
ii. Using the first part, we note that the inequality θ1 (x) ≥ θ0 (x) cannot become strict
at any point of the interval (or else we would have θ1 (d) > θ0 (d). This means θ1 = θ2
on [c, d], and subtracting the corresponding differential equations of the form (7), we
conclude the coefficients must be equal.

Now we’re ready to prove:

Theorem 3.1. Sturm’s comparison theorem: Consider 2 different Sturm-Liouville


operators Lj for j = 0, 1 with their respective pj , qj and rj . Let uj be solutions for Lj uj =
λj uj . Call the respective Prüfer angles θj . Assume that 1/p1 ≥ 1/p0 and λ1 r1 − q1 ≥
λ0 r0 − q0 .
Let (c, d) ⊆ (a, b). Suppose that at each end of (c, d) we have:

W (u0 , u1 ) = 0 or u0 = 0 (8)
(where W denotes the wronskian.)
If the functions u0 and u1 are linearly independent on (c, d) (i.e. not a constant multiple
one of the other) then u1 has at least a zero in (c, d).

9
Proof. Let us assume, without loss of generality, that the Prüfer angles at c are in [0, π).
(They can be taken to lie in [−π, π) as any polar angle, and if needed, we reverse the signs
of u0 or u1 to have the polar angle in the desired interval.)
The Wronskian W (u0 , u1 ) is readily calculated to give:

W (u0 , u1 ) = ρ0 ρ1 sin(θ0 − θ1 ) (9)


Therefore, since we know the Prüfer radius never vanishes and keeping in mind the
bounds on θj (c), (8) is equivalent to

θ0 (c) = θ1 (c) or θ0 (c) = 0


Either way, we have θ0 (c) ≤ θ1 (c), and so the lemma above implies θ0 (d) < θ1 (d) unless
θ0 = θ1 . But the latter would imply, going back to the ODE (6), that ρ0 and ρ1 are equal
up to a multiplicative constant, and so the same can be said of u0 and u1 . Since this
doesn’t happen by hypothesis, we have proved θ0 (d) < θ1 (d).

But note this means θ1 (d) > π. Indeed, applying the condition (8) at x = d, (and
similar reasoning to that of above for x = c) we get either of 2 cases:
• θ0 (d) = θ1 (d) mod π, and by virtue of θ0 (d) < θ1 (d), we must have

θ1 (d) − θ0 (d) ≥ π, therefore θ1 (d) ≥ π

• θ0 (d) = 0 mod π: We can’t have θ0 (d) ≤ 0 because θ0 (c) ∈ [0, π] and Prüfer angles
increase between zeros as noted in lemma 3.1. Therefore θ0 (d) ≥ π, which implies
θ1 (d) > π.
Finally, by the mean value theorem in [c, d], we find a x∗ such that θ1 (x∗ ) = π, which
is a zero of u1 , as desired.

By keeping the coefficients p, q and r constant, and just increasing the eigenvalues, we
conclude:

Corollary 3.2.1. : Let u0 and u1 be eigenfunctions of (1) associated to eigenvalues λ0


and λ1 , with λ1 > λ0 . Then between 2 zeros of u1 one can find a zero of u0 .

3.4 Remark: on the interlacing of zeros:


More careful estimates with the Prüfer angles allow us to show that, once we have our
sequence of eigenvalues ordered as E0 < E1 < ..., then the corresponding eigenfunctions

10
un have exactly n zeros on [a, b]. (Note this is in agreement with the elementary example
we did at the beginning on [0, π], in which the eigenfunctions are sin((n + 1)x).
Knowing the exact number of zeros, along with the corollary we got above, allows us
to be much more specific about the relative position of the zeros of the eigenfunctions:

Corollary 3.2.2. Let un be the eigenfunctions of (1), sorted according to the size of the
eigenvalues. Then the zeros of u( n + 1) interlace the zeros of un . This means, if xn,j are
the zeros of un inside (a, b), then:

a < xn+1,1 < xn,1 < xn + 1, 2 < ... < xn+1,n+1 < b

A quick analysis shows this is the only possible behavior if the zeros of un+1 must
always have a zero of un in between.

Figure 1: Illustration of the interlacing of zeros for the eigenfunctions u3 = sin(4x) and u4 =
sin(5x), for the elementary example −u00 = λu

11
References
[1] William E Boyce, Richard C DiPrima, and Douglas B Meade. Elementary differential
equations and boundary value problems, volume 9. Wiley New York, 1992.
[2] John K Hunter and Bruno Nachtergaele. Applied analysis. World Scientific Publishing
Company, 2001.
[3] Gerald Teschl. Ordinary differential equations and dynamical systems, volume 140.
American Mathematical Soc., 2012.

12

You might also like