You are on page 1of 146

SPRINGER BRIEFS IN MATHEMATICS

Ludovic Rifford

Sub-Riemannian
Geometry and
Optimal Transport

SpringerBriefs in Mathematics

Series editors
Krishnaswami Alladi, Gainesville, USA
Nicola Bellomo, Torino, Italy
Michele Benzi, Atlanta, USA
Tatsien Li, Shanghai, Peoples Republic of China
Matthias Neufang, Ottawa, Canada
Otmar Scherzer, Vienna, Austria
Dierk Schleicher, Bremen, Germany
Vladas Sidoravicius, Rio de Janeiro, Brazil
Benjamin Steinberg, New York, USA
Yuri Tschinkel, New York, USA
Loring W. Tu, Medford, USA
G. George Yin, Detroit, USA
Ping Zhang, Kalamazoo, USA

SpringerBriefs in Mathematics showcases expositions in all areas of mathematics and applied mathematics. Manuscripts presenting new results or a single new
result in a classical field, new field, or an emerging topic, applications, or bridges
between new results and already published works, are encouraged. The series is
intended for mathematicians and applied mathematicians.
For further volumes:
http://www.springer.com/series/10030

BCAM SpringerBriefs
Editorial Board
Enrique Zuazua
BCAM - Basque Center for Applied Mathematics & Ikerbasque
Bilbao, Basque Country, Spain
Irene Fonseca
Center for Nonlinear Analysis
Department of Mathematical Sciences
Carnegie Mellon University
Pittsburgh, USA
Juan J. Manfredi
Department of Mathematics
University of Pittsburgh
Pittsburgh, USA
Emmanuel Trelat
Laboratoire Jacques-Louis Lions
Institut Universitaire de France
Universite Pierre et Marie Curie
CNRS, UMR, Paris
Xu Zhang
School of Mathematics
Sichuan University
Chengdu, China
BCAM SpringerBriefs aims to publish contributions in the following disciplines:
Applied Mathematics, Finance, Statistics and Computer Science. BCAM has
appointed an Editorial Board that will evaluate and review proposals.
Typical topics include: a timely report of state-of-the-art analytical techniques,
bridge between new research results published in journal articles and a contextual
literature review, a snapshot of a hot or emerging topic, a presentation of core concepts that students must understand in order to make independent contributions.
Please submit your proposal to the Editorial Board or to Francesca Bonadei,
Executive Editor Mathematics, Statistics, and Engineering: francesca.bonadei@
springer.com

Ludovic Rifford

Sub-Riemannian
Geometry and
Optimal Transport

123

Ludovic Rifford
Laboratoire J.A. Dieudonn
Universit Nice Sophia Antipolis
Nice
France

ISSN 2191-8198
ISSN 2191-8201 (electronic)
ISBN 978-3-319-04803-1
ISBN 978-3-319-04804-8 (eBook)
DOI 10.1007/978-3-319-04804-8
Springer Cham Heidelberg New York Dordrecht London
Library of Congress Control Number: 2014933272
 The Author(s) 2014
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the
work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publishers location, in its current version, and permission for use must
always be obtained from Springer. Permissions for use may be obtained through RightsLink at the
Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Preface

The main goal of these lectures is to give an introduction to sub-Riemannian


geometry and optimal transport, and to present some of the recent progress in these
two fields. This set of notes is divided into three chapters and two appendices.
Chapter 1 is concerned with the notions of totally nonholonomic distributions and
sub-Riemannian structures. The concepts of End-Point mappings and singular
horizontal paths which play a major role through these lectures are introduced
here. Chapter 2 deals with sub-Riemannian geodesics. We study first- and
second-order variations of the End-Point mapping to derive necessary and sufficient conditions for an horizontal path to be minimizing. In Chap. 3, we study the
Monge problem for sub-Riemannian quadratic costs. We give a crash-course in
optimal transport theory and explain how the sub-TWIST condition together with
the Lipschitz regularity of a variational cost implies the well-posedness of
Monges problem. Then, we study the fine regularity properties of sub-Riemannian
distances to obtain existence and uniqueness of optimal transport maps in the
sub-Riemannian context. We recall basic facts on ordinary differential equations in
Appendix A and less classical results of differential calculus in normed vector
spaces in Appendix B. The latter plays a key role in Chap. 2.
The reader of these notes should be familiar with the basics in differential
geometry and measure theory. For further reading, we strongly encourage the
reader to look at other texts in sub-Riemannian geometry and optimal transport.
Multiple viewpoints always lead to deeper understanding and may open new
directions for research. Among them, we may suggest the textbooks by Montgomery [2], Agrachev, Barilari and Boscain [1], and Villani [3].
This set of notes grew from a series of lectures that I gave during a CIMPA
school in Beyrouth, Lebanon, on the invitation of Fernand Pelletier. I take the
opportunity of this preface to warmly thank Ali Fardoun, Mohamad Mehdi, and
Fernand Pelletier who organized the school, Ahmed El Soufi for his support and
friendship, and through him the Centre International de Mathmatiques Pures et
Appliques. My gratitude goes also to all faculties and students who attended this
sub-Riemannian CIMPA school in making it a success.
Nice, June 2013

Ludovic Rifford

vi

Preface

References
1. Agrachev, A., Barilari, D., Boscain, U.: Introduction to Riemannian and sub-Riemannian
geometry. To appear
2. Montgomery, R.: A tour of subriemannian geometries, their geodesics and applications.
In: Mathematical Surveys and Monographs, vol. 91. American Mathematical Society,
Providence, RI (2002)
3. Villani, C.: Optimal transport, Old and new. Springer-Verlag, Heidelberg (2008)

Contents

Sub-Riemannian Structures . . . . . . . . . . . . .
1.1 Totally Nonholonomic Distributions . . . .
1.2 Horizontal Paths and End-Point Mappings
1.3 Regular and Singular Horizontal Paths . . .
1.4 The Chow-Rashevsky Theorem . . . . . . . .
1.5 Sub-Riemannian Structures . . . . . . . . . . .
1.6 Notes and Comments . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

1
1
12
20
30
33
36
36

Sub-Riemannian Geodesics. . . . . . . . . . . . . . . .
2.1 Minimizing Horizontal Paths and Geodesics .
2.2 The Hamiltonian Geodesic Equation . . . . . .
2.3 The Sub-Riemannian Exponential Map . . . .
2.4 The Goh Condition . . . . . . . . . . . . . . . . . .
2.5 Examples of SR Geodesics . . . . . . . . . . . . .
2.6 Notes and Comments . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

37
37
44
52
59
70
75
76

Introduction to Optimal Transport . . . . . . . . . . . . . . . . . .


3.1 The Monge and Kantorovitch Problems . . . . . . . . . . . .
3.2 Optimal Plans and Kantorovitch Potentials . . . . . . . . . .
3.3 A Generalized Brenier-McCann Theorem . . . . . . . . . . .
3.4 Optimal Transport on Ideal and Lipschitz SR Structures
3.5 Back to Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Notes and Comments . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

77
77
81
91
97
111
116
118

Appendix A: Ordinary Differential Equations . . . . . . . . . . . . . . . . . .

121

Appendix B: Elements of Differential Calculus . . . . . . . . . . . . . . . . . .

125

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

139

vii

Chapter 1

Sub-Riemannian Structures

Throughout all the chapter, M denotes a smooth connected manifold without


boundary of dimension n 2.

1.1 Totally Nonholonomic Distributions


Distributions. A smooth distribution of rank m n (m 1) on M is a rank m
subbundle of the tangent bundle TM, that is a smooth map that assigns to each point
x of M a linear subspace (x) of the tangent space Tx M of dimension m. In other
terms, for every x M, there are an open neighborhood Vx of x in M and m smooth
vector fields Xx1 , . . . , Xxm linearly independent on Vx such that


(y) = Span Xx1 (y), . . . , Xxm (y)

y Vx .

Such a family of smooth vector fields is called a local frame in Vx for the distribution
. All the distributions which will be considered later will be smooth with constant
rank m [1, n]. Thus, from now on, distribution always means smooth distribution with constant rank. A co-rank k distribution on M is a distribution of rank
m = n k and any smooth vector field X on M such that X(x) (x) for any x M
is called a section of .
Example 1.1 We call trivial distribution on M the rank n distribution defined by
(x) = Tx M for all x M. For topological reasons, such a distribution may not
admit non-vanishing sections (for example, by the hairy ball theorem, there is no
non-vanishing continuous vector fields on any even dimensional sphere).
Example 1.2 In R3 with coordinates (x, y, z), the distribution generated by the
vector fields X and Y , that is


(x, y, z) = Span X(x, y, z), Y (x, y, z)
(x, y, z) R3 ,
L. Rifford, Sub-Riemannian Geometry and Optimal Transport,
SpringerBriefs in Mathematics, DOI: 10.1007/978-3-319-04804-8_1,
The Author(s) 2014

1 Sub-Riemannian Structures

with
y
x
X = x z and Y = y + z ,
2
2
is a rank 2 (or co-rank 1) distribution on R3 .
Example 1.3 More generally, if x = (x1 , . . . , xn , y1 , . . . , yn , z) denotes the coordinates in R2n+1 and the 2n smooth vector fields X 1 , . . . , X n , Y 1 , . . . , Y n are defined by
X i = xi

yi
xi
z , Y i = yi + z
2
2

i = 1, . . . , n,

then the distribution generated by the above vector fields is a co-rank 1 distribution
on R2n+1 .
Example 1.4 Let be a smooth non-degenerate 1-form on M, that is a 1-form which
does not vanish (x = 0 for any x M). The distribution defined as
(x) = Ker (x )

x M,

is a co-rank 1 distribution on M.
We say that a given distribution on M admits a global frame if there are m
smooth vector fields X 1 , . . . , X m on M such that


(x) = Span X 1 (x), . . . , X m (x)
x M.
In general, distributions do not admit global frames (see Example 1.1). It is worth
noticing that in the particular case of Rn all distributions are trivial.
Proposition 1.1 Any distribution on Rn admits a global frame.
Proof Let us first show how to construct a non-vanishing section of a given distribution on Rn .
Lemma 1.2 Let be a distribution of rank m on Rn . Then there is a non-vanishing
smooth vector field X such that X(x) (x), for any x Rn .
Proof (Proof of Lemma 1.2) Define the multivalued mapping : Rn 2R by
n



(x) = v (x) | |v| = 1

x Rn .

By construction, is locally Lipschitz with respect to the Hausdorff distance on


n , 2), there is (0, 1) such that
compact subsets of Rn . By compactness of B(0

for any x, y B(0n , 2) with |x y| < , and any v (x), there is w (y) such

1.1 Totally Nonholonomic Distributions

that |v w| < 1. Let N 2 be an integer such that the increasing sequence of balls
B1 , . . . , BN defined by
Bi = B (0n , i)

i = 1, . . . , N,

n , 1) BN . For every x Rn , we denote by Proj(x) the projection onto


satisfies B(0
the (m 1)-dimensional sphere (x). Note that the mapping Proj(x) is well-defined
and smooth on the open set


Ox = w Rn | v, w
= 0 for some v (x) .
For every i {1, . . . , N 1}, consider a smooth mapping Pi : Bi+1 Bi such that
|Pi (x) x| <

x Bi+1 ,

(1.1)

n , 1) Rn as follows:
and let w (0) be fixed. We define the vector field X : B(0
We first set

X1 (x) = Proj(x) (w)

x B1 .

Then, given Xi : Bi Rn , we define Xi+1 : Bi+1 Rn as


 

Xi+1 (x) = Proj(x) Xi Pi (x)

x Bi+1 .



By construction (by (1.1) and the definition of ), Xi Pi (x) belongs to Ox for
n , 1) and satisfies
any x Bi+1 . In conclusion, X = XN is smooth on B(0
0n = X(x) (x) for any x B(0n , 1). Repeating the construction on the annuli
B(0n , 2) \ B(0n , 1), B(0n , 3) \ B(0n , 2), . . ., we obtain a non-vanishing section of


on Rn .
We now prove Proposition 1.1 by induction on m. Let be a rank (m + 1)
distribution on Rn . By Lemma 1.2, it admits a non-vanishing section X on Rn . The
n
multivalued mapping : Rn 2R defined by



(x)
= (x) X(x)

x Rn ,

is a smooth rank m distribution (here {X(x)} denotes the space which is orthogonal to X(x) with respect to the Euclidean scalar product). Thus by induction,
there are smooth vector fields X 1 , . . . , X m which generate on Rn . The family


{X 1 , . . . , X m , X} is a global frame for .

1 Sub-Riemannian Structures

A finite family of smooth vector fields {X 1 , . . . , X k } is called a generating family


for on M if there holds


x M.
(x) = Span X 1 (x), . . . , X k (x)
Since vector fields of a generating family are not necessarily linearly independent,
any distribution can be represented by a generating family.
Proposition 1.3 Let be a distribution of rank m n on M. Then there are k =
m(n + 1) smooth vector fields X 1 , . . . , X k such that {X 1 , . . . , X k } is a generating
family for .
Proof By definition, for every x M, there is an open neighborhood Vx of x in M
and m smooth vector fields Xx1 , . . . , Xxm linearly independent on Vx such that


(y) = Span Xx1 (y), . . . , Xxm (y)

y Vx .

Since M is paracompact, there is a locally finite covering V = {Vi }iI where each
open set Vi equals Vxi for some xi M.
Lemma 1.4 There are a locally finite open covering {Uj }jJ of M and a partition
n+1
l=1 Jl of J such that the following properties are satisfied:
(a) For every j J, there is i = i(j) I such that Uj Vi .
(b) For every l {1, . . . , n + 1} and any j = j Jl , Uj Uj = .
Proof (Proof of Lemma 1.4) Recall that every smooth manifold is triangulable. Let
T = {Tt }tT be a triangulation of M that refines the covering {Vi }iI , in the sense
that the closure of each face F of T is a subset of some Vi . For every {0, . . . , n},
denote by T = {Tt }tT the family of -dimensional faces in T . For every
{0, . . . , n}, we can construct easily a collection of open sets W = {Ws }sS
satisfying the following properties:

W is a refinement of {Vi }iI ;


tT Tt sS Ws ;
each Ws is an open neighborhood of some -dimensional face of T ;
for any s = s S , Ws Ws = ;
for any s = s S0 , Ws Ws = ;
for any {1, . . . , n} and any s = s S , Ws Ws tT1 Tt 1 .

For that, it suffices to proceed by induction on and to make use of the properties
of a triangulation. We conclude easily.


Let us now show how to construct for every r {1, . . . , m} a family of sections
j
j
j
{X1 , . . . , Xn+1 | 1 j r} of such that Span{Xl (x) | 1 j r, 1 l n + 1}
has dimension r for any x M. We proceed by induction on r.

1.1 Totally Nonholonomic Distributions

First, for each l {1, . . . , n + 1} and each j Jl , there is i = i(j) I such


that Uj Vi = Vxi . Modifying Xi1 = Xx1i outside Uj if necessary, we may assume
that Xi1 is defined on M, does not vanish on Uj , and vanishes outside Uj . Define
1
by
X11 , . . . , Xn+1

1
Xl1 =
Xi(j)
l = 1, . . . , n + 1.
jJl
1 s are always
By construction (Lemma 1.4 (b)), the interior of the supports of the Xi(j)
disjoint. Therefore, each Xl1 is a non-vanishing section of on jJl Uj . This shows
that Span{Xl1 (x) | 1 l n + 1} has dimension 1 for any x M.
j
Assume now that we have constructed a family of smooth vector fields {Xi , | 1
j r, 1 i n + 1} such that



j
Span Xl (x) | 1 j r, 1 l n + 1
has dimension r for any x M (with r < m). For every j J, there is s = s(j)
{1, . . . , m} such that


j
Span Xxsi (j) (x), Xl (x) | 1 j r, 1 l n + 1
r+1
by
has dimension r + 1 for any x Uj . Define X1r+1 , . . . , Xn+1

Xlr+1 =

s(j)

Xi(j)

l = 1, . . . , n + 1.

jJl
s(j)

We leave the reader to check that by construction (modifying the Xxi (j) s if necessary
as above), the vector space


j
Span Xl (x) | 1 j r + 1, 1 l n + 1
has dimension r + 1 for any x M. The proof is complete.

The Hrmander condition. Recall that for any smooth vector fields X, Y on M
given by
X(x) =

n

i=1

ai (x)xi , Y (x) =

n


bi (x)xi ,

i=1

in local coordinates x = (x1 , . . . , xn ), the Lie bracket [X,Y ] is the smooth vector field
defined as

1 Sub-Riemannian Structures

Fig. 1.1 The Lie bracket [X, Y ](x) measures the extent to which X and Y do not commute from x

[X, Y ](x) =

n


ci (x)xi ,

i=1

where c1 , . . . , cn are the smooth scalar function given by


ci =

n





xj bi aj xj ai bj

i = 1, . . . , n.

j=1

For the upcoming controllability results, it is important to keep in mind the following
dynamical characterization of the Lie bracket (see Fig. 1.1).
Proposition 1.5 Let X, Y be two smooth vector fields in an neighborhood of x Rn .
Then we have
[X, Y ](x) = Dx Y X(x) Dx X Y (x) = lim



etY etX etY etX (x) x
t2

t0

(1.2)

where etX and etY denote respectively the flows of X and Y .


Proof All the functions appearing in the proof will be defined locally for t close to
0 and/or in a neighborhood of x. Define the smooth function h4 by


h4 (t) := etY etX etY etX (x)

t.

We have h4 (0) = 0. As a matter of fact, we have for any t,


h4 (t)


= Y (h4 (t)) +

etY
x

(t,h3 (t))

h3 (t)



where h3 is defined by h3 (t) := etX etY etX (x). Then we have
h3 (t)


= X(h3 (t)) +

etX
x

(t,h2 (t))

h2 (t),

1.1 Totally Nonholonomic Distributions



where h2 (t) := etY etX (x) and
h2 (t) = Y (h2 (t)) +

etY
x

(t,h1 (t))

h1 (t),

with h1 (t) := etX (x) and h1 (t) = X(etX (x)). Since partial derivatives of the form ex
at t = 0 are equal to Id, we get h1 (0) = X(x), h2 (0) = X(x) + Y (x), h3 (0) = Y (x)
and h4 (0) = 0. Therefore, the left-hand side of (1.2) is equal to 21 h4 (0). By derivating
the above formulas, we get
tX


h1 (0) = Dh1 (0) X h1 (0) = Dx X X(x),
 tY 
e
h2 (0) = Dh2 (0) Y h2 (0) + dtd
x

h1 (t)
(t,h1 (t))


.
t=0

But Dh2 (0) Y h2 (0) = Dx Y (X(x) + Y (x)) and






etY
h (t)
x (t,h1 (t)) 1
t=0



tY

d etY
e
=
h1 (0) +
h (0)
dt x (t,h1 (t))
x (0,h1 (0)) 1
t=0



2 tY

2
tY
e
e

+
h (0) X(x) + Dx X X(x)
=
tx (0,x)
x 2 (0,x) 1
tY

e
=
X(x) + Dx X X(x) = Dx Y X(x) + Dx X X(x).
x
t
(0,x)

d
dt

We infer that h2 (0) = Dx Y (2X(x) + Y (x)) + Dx X X(x). In the same way, we have

h3 (0)

= Dh3 (0) X

h3 (0) +

d
dt



etX
x

(t,h2 (t))

h2 (t)

,
t=0

with Dh3 (0) X h3 (0) = Dx X Y (x) and




d
dt

etX
x



(t,h2 (t))

h2 (t)

= Dx X Y (x) + Dx Y (2X(x) + Y (x)),


t=0

which implies h3 (0) = 2Dx X Y (x) + Dx Y (2X(x) + Y (x)). Finally




d
h4 (0) = Dh4 (0) Y h4 (0) +
dt

etY
x

(t,h3 (t))


h3 (t)
t=0

1 Sub-Riemannian Structures

etY
x

etY
h3 (0)
x
(t,h3 (t)) t=0
(0,h3 (0))


= Dx Y Y (x) 2Dx X Y (x) + Dx Y 2X(x) + Y (x)


= 2 Dx Y X(x) Dx X Y (x) = 2[X, Y ](x),

d
dt

h3 (0) +

which concludes the proof.

Remark 1.1 The Lie bracket is bilinear, skew-symmetric and satisfies the Jacobi
identity, that is given three smooth vector fields X, Y , Z, we have

 
 

X, [Y , Z] + Y , [Z, X] + Z, [X, Y ] = 0.
Remark 1.2 Given a smooth diffeomorphism from a smooth manifold U to a
smooth manifold V and X a smooth vector field on U , we recall that the pushforward (X) of X is defined by


(X)(y) := D 1 (y) X( 1 (y)

y V .

Then if Y ia another smooth vector field on U , we have


[ (X), (Y )] = ([X, Y ]) .
For any family F of smooth vector fields on an open set O M, we denote by
Lie(F ) the Lie algebra of vector fields generated by F . It is the smallest vector
subspace S of X (M) (the space of smooth vector fields on M) containing F that
also satisfies
[X, Y ] S
X F , Y S.
It can be constructed as follows: Denote by Lie1 (F ) the space spanned by F in
X (M) and define recursively the spaces Liek (F ) (k = 1, 2, . . .) by



Liek+1 (F ) = Span Liek (F ) [X, Y ] | X F , Y Liek (F )

k 1.

This defines an increasing sequence of vector spaces in X (M) satisfying


Lie(F ) =

Liek (F ).

k1

In general, Lie(F ) is an infinite-dimensional subspace of X (M).


Example 1.5 Let A be a nn real matrix, b be a vector in Rn , and X, Y be the smooth
vector fields in Rn defined by
X(x) = Ax, Y (x) = b

x Rn .

1.1 Totally Nonholonomic Distributions

The non-zero Lie brackets of X and Y are always constant vector fields of the form


ad0X (Y ) := Y = b and adXk+1 (Y ) := X, adkX (Y ) = (1)k+1 Ak+1 b

k 0.

By the Cayley-Hamilton Theorem, An can be expressed as a linear combination of


A0 , . . . , An1 . Therefore, Lie(X, Y ) is the set of vector fields Z in Rn of the form
Z(x) = Ax +

n1


i Ai b

x Rn ,

i=0

with , 0 , . . . , n1 R. It is a finite-dimensional Lie algebra.


Example 1.6 Let X, Y be the two smooth vector fields in R2 (with coordinates x =
(x1 , x2 )) defined by
X(x) = x1 , Y (x) = f (x1 )x2

x R2 ,

where f is a smooth scalar function. Then, Lie(X, Y ) is the space of smooth vector
fields spanned by X and
adkY (X) = f (k) x2

for k 0.

Thus, Lie(X, Y ) is infinite-dimensional whenever the derivatives of f span an infinitedimensional space of functions.
For any point x M, Lie(F )(x) denotes the set of all tangent vectors X(x) with
X Lie(F ). It follows that Lie(F )(x) is always a linear subspace of Tx M, hence
finite-dimensional.
Example 1.7 Returning to Example 1.6 and denoting by (e1 , e2 ) the canonical basis
of R2 , we check that


x R2 .
Lie(X, Y )(x) = Span e1 , f (k) (x1 )e2 | k = 0, 1, 2, . . .
In particular, Lie(X, Y )(x) = Re1 if f (x) and all its derivatives at x vanish and
Lie(X, Y )(x) = R2 otherwise.
We say that the smooth vector fields X 1 , . . . , X m satisfy the Hrmander condition
on some open set O M if and only if


Lie X 1 , . . . , X m (x) = Tx M

x O.

A distribution on M is called totally nonholonomic on M if for every x M, there


are an open neighborhood Vx of x in M and a local frame Xx1 , . . . , Xxm on Vx which

10

1 Sub-Riemannian Structures

satisfies the Hrmander condition on Vx . Moreover, we call degree of nonholonomy


(or simply degree) of at x the smallest integer r = r(x) 1 such that


Lier X 1 , . . . , X m (x) = Tx M.
These definitions are intrinsic, they do not depend upon the choice of the local frame
Xx1 , . . . , Xxm . This is a consequence of the following result:
Proposition 1.6 Let {X 1 , . . . , X m }, {Y 1 , . . . , Y m } be two families of linearly independent smooth vector fields which generate the same distribution on an open set
O M. Then there holds for any integer k 1,




Liek X 1 , . . . , X m (x) = Liek Y 1 , . . . , Y m (x)

x O.

Proof It is sufficient to show that the left-hand side is included in the right-hand side
for any integer k 2. Since the Y j (x) are always linearly independent, there are
j
smooth functions i : O R with i, j = 1, . . . , m, such that
X (x) =
i

m


i (x)Y j (x)

x O, i = 1, . . . , m.

j=1

Then for every i = 1, . . . , m and every smooth vector field Z, there holds

[X , Z] =
i

m

j=1

j
i Y j , Z

m


j
i [Y j , Z]

j=1

m


di (Z)Y j .

j=1





Since Span X 1 (x), . . . , X m (x) Span Y 1 (x), . . . , Y m (x) for any x, this shows
that




x O.
Lie2 X 1 , . . . , X m (x) Lie2 Y 1 , . . . , Y m (x)
We conclude easily by an inductive argument.

Remark 1.3 Since for any smooth vector field X, there holds [X, X] = 0, a one
dimensional distribution cannot be totally nonholonomic.
Example 1.8 The distribution given in Example 1.2 is totally nonholonomic. We
check easily that
i, j = 1, . . . , n,
[X, Y ] = z
which means that has degree 2 everywhere.
Example 1.9 More generally, the distribution given in Example 1.3 is totally nonholonomic of degree 2. We check easily that

1.1 Totally Nonholonomic Distributions

11

[X i , Y j ] = ij z

i, j = 1, . . . , n.

Example 1.10 The Martinet distribution in R3 (with coordinates (x, y, z)) is the
distribution generated by X and Y with
X = x , Y = y +

x2
z .
2

The first Lie bracket of X, Y is given by


[X, Y ] = xz .
For any (x, y, z) R3 with x = 0, the three vectors X(x, y, z), Y (x, y, z),
[X, Y ](x, y, z) are linearly independent. Hence, is a totally nonholonomic distribution of degree 2 on R3 \ {x = 0}. Moreover, since [[X, Y ], Y ] = z , has
degree three on the plane {x = 0}.
Example 1.11 More generally, if X, Y are given by
X = x , Y = y + x l z ,
with l N , we check easily that the distribution generated by X and Y is a totally
nonholonomic distribution of degree l + 1.
Example 1.12 Assume that M has dimension n = 2p + 1 and let be a 1-form on
M satisfying
(d)p = 0
then the distribution given by = Ker() is totally nonholonomic of degree 2. Such
a 1-form is called a contact form and the associated distribution is called a contact
distribution. As a matter of fact, given x M, there is a local set of coordinates
(x1 , . . . , xn ) in an open neighborhood V of x such that has the form

2p


ai dxi + dxn ,

i=1

where a1 , . . . , a2p are smooth scalar function on V such that


ai (x ) = 0

i = 1, . . . , 2p.

Hence, the family of smooth vector fields X 1 , . . . , X 2p given by


X i = xi ai xn

i = 1, . . . , 2p,

12

1 Sub-Riemannian Structures

defines a local frame for = Ker() in V . On the one hand, the n = 2p + 1-form
(d)p at x reads


(d)p x

! aj





a
il
l

dxn dxi1 dxj1 . . . dxip dxjp |x ,


xil
xjl
P2p

l=1,...,p

(1.3)


where P2p denotes the set of p-tuples of the form = (i1 , j1 ), . . . , (ip , jp ) with
{i1 , j1 , . . . , ip , jp } = {1, . . . , 2p} and il < jl for all l = 1, . . . , p. On the other hand,
we check easily that




X i , X j (x ) = xi aj xj ai xn (x )

i, j = 1, . . . , 2p.

Therefore, if there is i {1, . . . , 2p} such that [X i , X j ](x ) = 0 for any j, then
all the products appearing in (1.3) vanish, which implies that ( (d)p )x = 0,
contradiction. We deduce that for every i {1, . . . , n}, there holds




 

Span X 1 (x ), . . . , X 2p (x ), X i , X 1 (x ), . . . , X i , X 2p (x ) = Tx M.

(1.4)

This means that = Ker() is totally nonholonomic of degree 2.

1.2 Horizontal Paths and End-Point Mappings


Horizontal paths. Let be a distribution of rank m n on M. A continuous path
: [0, T ] Rn is said to be an horizontal path with respect to if it is absolutely
continuous with square integrable derivative (see Appendix A) and satisfies


(t) (t) a.e. t [0, T ].
x,T
the set of horizontal paths
For every x M and every T > 0, we denote by
: [0, T ] M starting at x. If admits a global frame X 1 , . . . , X m , then there is a
x,T
and an open subset of L 2 ([0, T ]; Rm ).
one-to-one correspondence between


Proposition 1.7 Let F = X 1 , . . . , X m be a global frame for . Then for every
x,T
x M and every T > 0, there is an open subset UF
of L 2 ([0, T ]; Rm ) such that
the mapping
x,T
x,T
u
,
u UF

1.2 Horizontal Paths and End-Point Mappings

13

(where u : [0, T ] M is the unique solution to the Cauchy problem


u (t) =

m


ui (t)X i (u (t)) a.e. t [0, T ],

u (0) = x)

(1.5)

i=1

is one-to-one.
Proof The set of controls u L 2 ([0, T ]; Rm ) such that the solution u of (1.5) is
well-defined on [0, T ] is a non-empty open set. Moreover, by construction, any path
u is absolutely continuous with square integrable derivative and almost everywhere
tangent to . This proves that the map under study is well-defined on some open set
x,T
L 2 ([0, T ]; Rm ). Let ,x,T be such that there are u, v L 2 ([0, T ]; Rm )
UF
with
m
m


ui (t)X i ( (t)) =
vi (t)X i ( (t))
a.e. t [0, T ].
(t) =
i=1

i=1

Since the tangent vectors X 1 ( (t)) , . . . , X m ( (t)) are always linearly independent
in T (t) M, we infer that u(t) = v(t) for almost every t [0, T ], which proves that
x,T
our map is injective. Furthermore, given
, for almost every t [0, T ],
the
path

is
differentiable
at
t
and
there
is
a
unique
u(t) Rm such that (t) =
"m
i ( (t)). By construction, the function u : [0, T ] Rm belongs to
u
(t)X
i=1 i
L 2 ([0, T ]; Rm ).


Remark 1.4 If M is compact, then solutions to (1.5) are defined for any u
x,T
L 2 ([0, T ]; Rm ), which means that UF
= L 2 ([0, T ]; Rk ).


Given a family of smooth vector fields F = X 1 , . . . , X k on M and x M, T >


x,T
L 2 [0, T ]; Rk is called a control and the solution u :
0, a function u UF
[0, T ] M to the Cauchy problem
u (t) =

k


ui (t)X i (u (t)) a.e. t [0, T ],

u (0) = x

(1.6)

i=1

is called the trajectory starting at x and associated with the control u. Since any horizontal path can be viewed as a trajectory associated to a control system like (1.6), we
restrict in the next paragraph our attention to End-Point mappings associated with
finite families of smooth vector fields.


End-Point mappings. Let F = X 1 , . . . , X k be a family of k 1 smooth vector
x,T
fields on M. As before, given x and T > 0, there is a maximal open subset UF



x,T
2
k
L [0, T ]; R such that for every u UF , there is a unique solution to the Cauchy
problem (1.6). The End-Point mapping associated to F at x in time T > 0 is defined
as follows,
x,T
x,T
: UF
M
EF
u u (T ).

14

1 Sub-Riemannian Structures

x,T
u the time-dependent vector field defined by
Given u UF
, we denote by XF

XF (t, x) :=
u

m


ui (t)X i (x)

a.e. t [0, T ], x M.

i=1
u (t, x) is well-defined and smooth on a neighbourhood of x; we denote by
Its flow F
u
Dx F (t, x) its differential at (t, x) with respect to the x variable. The following result
holds. (We refer the reader to Appendix A for reminders in differential equations and
to Appendix B for reminders in differential calculus in infinite dimension.)
x,T
x,T
Proposition 1.8 The End-Point mapping EF
is of class C 1 on UF
and for every
x,T
control u UF , its differential at u,
x,T
Du EF
: L 2 ([0, T ]; Rk ) TE x,T (u) M
F

is given by
#
x,T

Du EF (v) = Dx F (T , x)

T

u
Dx F
(t, x)

1

 x,t 
v
t, EF (u) dt
XF

(1.7)

for every v L 2 ([0, T ]; Rk ).


Proof Any smooth manifold can be smoothly embedded in an Euclidean space.
Then without loss of generality we can assume that M is a smooth submanifold of
some RN and consequently that the X i s are the restrictions of smooth vector fields
x,T
X 1 , . . . , X k which are defined in an open neighborhood of M in RN . Given u UF
and v L 2 ([0, T ]; Rk ) let us look at


1  x,T 
x,T  
EF u + v EF
u .
0
lim

Using the previous notations, we have


#

u+v (T ) =
0

k




(ui (t) + vi (t)) X i u+v (t) dt,

(1.8)

i=1

with u+v (0) = x. For every i = 1, . . . , k and every t [0, T ], the Taylor expansion
of each X i at u (t) gives






X i u+v (t) = X i u (t) + Du (t) X i u+v (t) u (t) + |u+v (t) u (t)| o(1).
Setting x (t) := u+v (t) u (t) for any t, we may assume that x has size , then
(1.8) yields formally

1.2 Horizontal Paths and End-Point Mappings

x (T ) =
0

$ k


15

ui (t)Du (t) X x (t) +

i=1

m


%


vi (t)X u (t) dt + o().
i

i=1

This suggests that the linear part in of the function t [0, T ] x (t) should be
solution to the Cauchy problem
(t) =

 k



i

ui (t)Du (t) X

i=1

(t) +

 k





vi (t)X u (t)
i

a.e. t [0, T ],

(1.9)

i=1

with (0) = 0. Using Gronwalls


Lemma (see Appendix A) we check easily that for

every v L 2 [0, T ]; Rk , the quantity


1  x,T 
x,T  
EF u + v EF
u (T )

tends to zero as tends to zero. For almost every


" t [0, T ], denote by Au (t) the
matrix in MN (R) representing the linear operator ki=1 ui (t)Du (t) X i in the canonical
basis of RN and for every t [0, T ], denote by Bu (t) the matrix in MN,k (R) whose
columns are the X i (u (t))s. Denote by Su : [0, T ] MN (R) the solution to the
Cauchy problem
S u (t) = Au (t)Su (t) a.e. t [0, T ], Su (0) = In .
Note that Su (t) is exactly the Jacobian of the flow u (with F = {X 1 , . . . , X k }) at
F
(t, u (t)) with respect to the x variable. The solution of (1.9) at time T is given by
(see Appendix A)
#
x,T
(T ) = Du EF
(v) = Su (T )

Su (t)1 Bu (t)v(t)dt.

x,T
is differentiable at u and (1.7) is satisfied. Using Gronwalls lemma again,
Thus EF
x,T
x,T
depends continuously on u on UF
.


we leave the reader to check that Du EF
x,T
Remark 1.5 If M = Rn , the derivative of EF
at u is given by

x,T

Du EF (v) = S(T )

S(t)1 B(t)v(t)dt,

where S : [0, T ] Mn (R) is the solution to the Cauchy problem


= A(t)S(t) a.e. t [0, T ], S(0) = In
S(t)

(1.10)

16

1 Sub-Riemannian Structures

and where the matrices A(t) Mn (R), B(t) Mn,k (R) are defined by
A(t) :=

k




ui (t)JX i u (t)

a.e. t [0, T ]

(1.11)

i=1
x,t
(u) and JX i denotes the Jacobian matrix of X i at u (t)) and
(u (t) = EF



B(t) := X 1 (u (t)), . . . , X k (u (t))

t [0, T ].

(1.12)

x,T
, we set
Properties of End-Point mappings. Given u UF


x,T  2
k
Imx,T
F (u) := Du EF L ([0, T ]; R ) .
x,T
Defining y = EF
(u), we observe that Imx,T
F (u) is a vector space contained in Ty M,
x,T
x,T
hence of dimension n. We call rank of u UF
with respect to EF
, denoted by
x,T
x,T
rankF (u), the dimension of ImF (u).
Given u  L 2 ([0, T ]; Rk ) and > 0, we define the controls u L 2 ([0; 1 T ];Rk )
and u L 2 [0, T ]; Rk by

a.e. t [0, 1 T ],

u (t) := u(t)

and u (t) := u(T t)

a.e. t [0, T ].

Moreover, if in addition u L 2 ([0, T ]; Rk ) we define u u , the concatenation of u


and u , in L 2 [0, T + T  ]; Rk ) by
&

u u (t) =

u(t)
if 0 t T ;
u (t T ) if T < t T + T 

a.e. t [0, T + T  ].

x,T
x,T
Proposition 1.9 Let u UF
, u UF
and > 0 be fixed, then we have (we set
x,T
y := EF (u)):
1

x, T
x, T
(i) u UF
and rankx,T
(u ).
F (u) = rankF
y,T
y,T
x,T
(ii) u belongs to UF and rankF (u) = rankF (u).




x,T +T 
+T  
 max rankx,T (u), ranky,T (u ) .
u

u
and rankx,T
(iii) u u UF
F
F
F

Proof To prove (i), we note that if u : [0, T ] M is a solution of (1.6),


 then the

path u, : [0, 1 T ] M defined by u, (t) = u (t) for any t 0, 1 T ,
satisfies for a.e. t [0, 1 T ],

1.2 Horizontal Paths and End-Point Mappings

u, (t) = u (t) =

k


17

ui (t)X i (u (t)) =

k


i=1

i=1
1 T

x,
We infer that u belongs to UF
1 T

x,
Du EF



u (t)X i u, (t) .

and satisfies


v L 2 [0, T ]; Rk .

x,T
(v)
(v ) = Du EF

We conclude easily. To prove (ii), we note that


E x,T (u),T

EFF

 
u = x

x,T
u UF
.

z,T
The mapping (z, v) EF
(v) is C 1 and its derivative with respect to the z variable


x,T
u (T , x)1 . Derivating the above equality at u
(u), u is given by Dx F
at y = EF
yields
x,T
(v) + Du EF
(Dx F )u (T , x)1 Du EF

y,T

 
v = 0



v L 2 [0, T ]; Rk .

The result follows easily. We now observe that




E x,T (u),T 

x,T +T
EF
(u u ) = EFF

(u ),

y,T 

x,T
x,T
and u UF with y = EF
(u). Again, derivating yields
for any u UF


y,T 
x,T +T  
x,T
u
v v = Dx F
(T  , y) Du EF
(v) + Du EF (v ),
Duu EF

(1.13)

for any v L 2 ([0, T ]; Rk ) and v L 2 ([0, T  ]; Rk ). We conclude easily.

As the next example shows, the inequality in Proposition 1.9 may be strict.
Example 1.13 Let F = {X 1 , X 2 } be the family of smooth vectors fields on R4 (with
coordinates x = (x1 , x2 , x3 , x4 ) and canonical basis (e1 , e2 , e3 , e4 ) ) defined by
X 1 = x1 and X 2 = x2 + x12 x3 + x1 x2 x4 .


Set x = (1, 0, 0, 0), y = (0, 0, 0, 0), and define the controls u, u L 2 [0, 1]; R2
by
u(t) = (1, 0) and u (t) = (0, 1)

t [0, 1].

x,1
. As a matter of fact, the trajectory
The control u has rank 3 with respect to EF
4
u : [0, 1] R starting at x and associated with u equals u (t) = (1 + t, 0, 0, 0)
for any t [0, 1] and using the representation formula given in Remark 1.5, we have

18

1 Sub-Riemannian Structures

#
Du EF (v) =



v L 2 [0, 1]; R2 ,

x,1

B(t)v(t)dt
0



where B(t) = X 1 (u (t)), X 2 (u (t)) for any t [0, 1]. Then,
&#
x,1

ImF (u) = Span

'
v1 (t)dt e1 | v1 L ([0, 1]; R)
2

&#

+ Span

v2 (t)dt e2 +

'
(1 t) v2 (t)dt e3 | v2 L ([0, 1]; R)
2

= Span {e1 , e2 , e3 } .
The trajectory u : [0, 1] R4 starting at y and associated with u equals u (t) =
(0, t, 0, 0) for any t [0, 1], and there holds
#

y,1

Du EF (v) = S(T )


where

S(t)1 B(t)v(t)dt,

1 000
1
0
0 1 0 0

S(t) =
0 0 1 0 and B(t) = 0
t2
0
2 0 0 1

0
1

0
0

t [0, 1].

We infer that

y,1

ImF

(u ) = Span

*#
1

+ Span

+
v2 (t)dt e2 | v2 L 2 ([0, 1]; R)

*#
1
0

v1 (t)dt e1 +

# 1$
0

t2
1
2

%
v1 (t)dt e4 | v1 L 2 ([0, 1]; R)

= Span {e1 , e2 , e4 } .

Finally, we note that




u
(1, y)(e3 ) = e3 .
Dx F

4
Therefore, by (1.13), this implies Imx,2
F (u u ) = R , which means that the rank


y,1 
x,1

rankx,2
F u u = 4 is strictly larger than the maximum of rankF (u) and rankF (u )
which is equal to 3.

The following proposition implies that the rank of a control is always larger or
equal than the dimension of the family {X 1 , . . . , X k } at the end-point.

1.2 Horizontal Paths and End-Point Mappings

19

x,T
Proposition 1.10 We have for every u UF
,



x,T
(u) Imx,T
X i EF
F (u)

i = 1, . . . , k.

Proof Let us first assume that we work in Rn . In this case (see Remark 1.5), the
x,T
at u is given by
derivative of EF
#
x,T
Du EF
(v) = S(T )

S(t)1 B(t)v(t)dt



v L 2 [0, T ]; Rk ,

where S() is the solution to the Cauchy problem (1.10) and where the matrices
A(t) Mn (R), B(t) Mn,k (R) are defined respectively by (1.11) and (1.12). Fix
k
of the
i {1, . . . , k} and denote by ei the i-th vector

 canonical basis in R . Define,
2
k
for every (0, T ), the control v L [0, T ]; R by
&
v (t) =

0
if 0 t T ;
(1/)ei if T < t T .

We have
,
,
,
,
x,T
,Du EF (v ) X i (u (T )),
,
# T
#
,
1 i
,
= ,(1/)S(T )
S(t) X (u (t) dt (1/)S(T )
T
T

(1/) |S(T )|

#
(1/) |S(T )|
+ (1/) |S(T )|

T
T
T
# T

T
T

S(T )

,
,
X (u (T )) dt ,,

1 i

,
,
,
,
,S(t)1 X i (u (t)) S(T )1 X i (u (T )), dt
-,
,
-,
,
-S(t)1 - ,X i (u (t)) X i (u (T )), dt

,
-,
,
-,
-S(t)1 S(T )1 - ,X i (u (T )), dt.

Both mappings t X i (xu (t)) and t S(t)1 are continuous at t = T . Therefore,


there holds
x,T
(v ) = X i (u (T )).
lim Du EF
0


x,T  2
k
k
Since Imx,T
F (u) = Du EF L ([0, T ]; R ) is a closed subset of R , we infer that
X i (u (T )) belongs to Imx,T
F (u).

20

1 Sub-Riemannian Structures

If we are now in M, then there exists a local chart around x and t (0, T ) such that
u (t ) O. Set T  := T t and define u1 L 2 ([0, t ]; Rk ) and u2 : L 2 ([0, T  ]; Rk ) by
u1 (t) = u(t)

t [0, T  ].

t [0, t ] and u2 (t) = u(t + t )

We conclude easily by the above proof in Rn together with (1.13).

1.3 Regular and Singular Horizontal Paths


Regular and singular controls. Let F = {X 1 , . . . , X k } be a family of k 1 smooth
x,T
is
vector fields on M. Given x M and T > 0, we say that the control u UF
x,T
regular with respect to x and F if rankF (u) = n (recall that M has dimension n).
Otherwise, we shall say that u is singular. In other terms, u is singular if and only if
x,T
x,T
, that is if EF
is not a submersion
it is a critical point of the End-Point mapping EF
at u.
Remark 1.6 Proposition 1.10 shows that if F = {X 1 , . . . , X k } is a family of smooth
vector fields on M such that


x M,
Span X 1 (x), . . . , X k (x) = Tx M
x,T
with u = 0, T > 0)
then every non-trivial admissible control (that is u in some UF
is regular.
x,T
Propositions 1.9 shows that a given control u UF
is singular with respect to x
1 T

x,
and F if and only if any control of the form u UF

(with = 0) is singular

y,T

x,T
with respect to x and F and if and only if u UF (with y = EF
) is singular
with respect to y and F . It also shows that if the concatenation of several controls is
singular then each of them is singular.

Define k Hamiltonians h1 , . . . , hk : T M R by hi := hX i for any i = 1, . . . , k,


that is
= (x, p) T M, i = 1, . . . , k.
hi () = p X i (x)

For every i = 1, . . . , m, h i denotes the Hamiltonian vector field on T M associated


to hi , which in local coordinates on T M reads

h i (x, p) =

hi
hi
(x, p),
(x, p) .
p
x

Singular controls can be characterized as follows.

1.3 Regular and Singular Horizontal Paths

21

x,T
Proposition 1.11 The control u UF
is singular with respect to x and F if and
only if there exists an absolutely continuous arc : [0, T ] T M that never
intersects the zero section of T M, such that
k


ui (t) h i ((t))

a.e. t [0, T ]

(1.14)

hi ((t)) = 0, t [0, T ]

i = 1, . . . , k.

(1.15)

(t)
=

i=1

and

We say that is an abnormal extremal lift of u : [0, T ] M (defined by (1.6)).


x,T
Proof Let us first assume that we work in Rn . If Du EF
: L 2 ([0, T ]; Rk ) Rn is
n

not surjective, then there exists p (R ) \ {0} such that


x,T
(v) = 0
p Du EF



v L 2 [0, T ]; Rk .

Remembering Remark 1.5, the above identity can be written as


#

pS(T )S(t)1 B(t)v(t)dt = 0

v L 2 ([0, T ]; Rk ).

Taking v L 2 ([0, T ]; Rk ) defined as




v(t) = pS(t)S(t)1 B(t)

t [0, T ],

we deduce that
#
0

,
 ,2
,
,
, pS(T )S(t)1 B(t) , ds = 0,

which implies that pS(T )S(t)1 B(t) = 0 for any t [0, T ] (note that the function
t pS(T )S(t)1 B(t) is continuous). Let us now define, for each t [0, T ],
p(t) := pS(T )S(t)1 .
By construction, p : [0, T ] (Rn ) is an absolutely continuous arc. Since p = 0
and S(t) is invertible for all t [0, T ], p(t) does not vanish on [0, T ]. Moreover,
recalling that, by definition of S,
d
S(t)1 = S(t)1 A(t)
dt

a.e. t [0, T ],

22

1 Sub-Riemannian Structures

we conclude that p satisfies the following properties:


p (t) = p(t)A(t)

a.e. t [0, T ]

and
p(t)B(t) = 0

t [0, T ]

which shows that (1.14)(1.15) are satisfied with (t) = (u (t), p(t)) for any t
[0, T ). By the way, we note that by construction, we have for every t (0, T ],
x,t
(v) = 0
p(t) Dut EF



v L 2 [0, t]; Rk ,

(1.16)

where ut denotes the restriction of u to [0, t].


Conversely, let us assume that there exists an absolutely continuous arc p :
[0, T ] (Rn ) \ {0} such that (1.14) and (1.15) are satisfied with = (u , p).
This means that
p(t) = p(t)A(t)
a.e. t [0, T ]
and
p(t) B(t) = 0

t [0, T ].

Setting p := p(T ) = 0, we have, for any t [0, T ],


p(t) = pS(T )S(t)1 .
Hence, we obtain
pS(T )S(t)1 B(t) = 0

t [0, T ],

which in turn implies


x,T
(v) = 0, v L 2 ([0, T ]; Rk ).
p Du EF

This concludes the proof. Again, the above proof shows indeed that (1.16) holds for
any t (0, T ].
Assume now that we work on M. We can cut the path u : [0, T ] M associated
with u and u itself into a finite number of pieces 1 , . . . , l and u1 , . . . , ul such
that each control ul is singular and each path l is valued in a chart of M. Then
we can apply the previous arguments on each chart and thanks to (1.16) obtain a
non-vanishing absolutely continuous arc satisfying (1.14)(1.15) on [0, T ].


Remark 1.7 We keep in mind that if : [0, T ] T M is an absolutely continuous
arc satisfying (1.14)(1.15), then

1.3 Regular and Singular Horizontal Paths

23

Fig. 1.2 The concatenation 1 2 3 of three paths


x,t
p(t) Dut EF
(v) = 0



v L 2 [0, t]; Rk ,

where (t) = (u (t), p(t)) and ut denotes the restriction of u to [0, t]. (In the sequel,
v or p v with = (x, p) in local coordinates denotes the evaluation of the form
at v Tx M.)
Remark 1.8 In local coordinates, Proposition
  1.11 means that there exists an
absolutely continuous arc p : [0, T ] Rn \ {0} satisfying
p (t) =

k


ui (t) p(t) Du (t) X i

a.e. t [0, T ]

(1.17)

t [0, T ], i = 1, . . . k.

(1.18)

i=1

and


p(t) X i u (t) = 0

Regular and singular paths.Regular Let be a distribution of rank m n on M.


As seen before, it can be represented by a generating family F = {X 1 , . . . , X k } of
smooth vector fields (see Proposition 1.3). Given a point x M, a time T > 0, and
x,T
, we set
an horizontal path

x,T  2
Im ( ) := Du EF
L ([0, T ]; Rk ) TE x,T (u) M,
F

x,T
is any control such that = u is solution to the Cauchy problem
where u UF
x,T
(1.6). We call rank of
, denoted by rank ( ), the dimension of Im (u).
We shall say that is singular (with respect to ) if rank ( ) < n and regular
otherwise.

Remark 1.9 By Remark 1.6, if has rank m = n then any non-trivial horizontal
path is regular.
Proposition 1.9 does apply to horizontal paths. The rank of an horizontal path
depends only on the curve drawn by the path in M, it does not depend upon its
parametrization. Moreover if an horizontal path which is the concatenation of several
paths (the concatenation of paths is defined in the same way as the concatenation of
controls, see Fig. 1.2) is singular then each piece is necessarily singular.
Example 1.14 Returning to Examples 1.2 and 1.8, we consider in R3 with coordinates x = (x1 , x2 , x3 ), the totally nonholonomic rank two distribution generated by

24

1 Sub-Riemannian Structures

X 1 = x1

x2
x
2 3

and

X 2 = x2 +

x1
x .
2 3

We claim that the singular horizontal paths are the constant curves or equivalently
that the only singular control with respect to F = {X 1 , X 2 } is the control u 0. Let
x,T
be a singular control.
us prove this claim. Let x R3 , T > 0 be fixed and u UF
3
Denote by x : [0, T ] R the solution to the Cauchy problem
x (t) = u1 (t)X 1 (x(t)) + u2 (t)X 2 (x(t)) a.e. t [0, T ], x(0) = x.

(1.19)

 
From Proposition 1.11, there exists an absolutely continuous arc p : [0, T ] R3 \
{0} such that
p (t) = u1 (t) p(t) Dx(t) X 1 u2 (t) p(t) Dx(t) X 2

(1.20)

for a.e. t [0, T ] and


p(t) X 1 (x(t)) = p(t) X 2 (x(t)) = 0

t [0, T ].

(1.21)

Taking the derivatives in (1.21) gives




p (t) X i (x(t)) + p(t) Dx(t) X i x (t) = 0

a.e. t [0, T ], i = 1, 2

which implies, by (1.19)(1.20),


u1 (t) p(t) [X 1 , X i ](x(t)) + u2 (t) p(t) [X 2 , X i ](x(t)) = 0

a.e. t [0, T ].

Taking i = 1 and i = 2, we obtain that for almost every t [0, T ],


u1 (t) p(t) [X 1 , X 2 ](x(t)) = u2 (t) p(t) [X 1 , X 2 ](x(t)) = 0.
Since [X 1 , X 2 ] = x 3 and (1.21) is satisfied with p(t) = 0, we deduce that u 0.
Example 1.15 The property of the previous example is satisfied by much more
general distributions. A distribution on M is called fat if, for every x M and
every section X of with X(x) = 0, there holds


Tx M = (x) + X, (x),
where





X, (x) := [X, Z](x) | Z section of .

(1.22)

1.3 Regular and Singular Horizontal Paths

25

The condition above being very restrictive, there are very few fat distributions. Fat
distributions on three-dimensional manifolds are the rank-two distributions satisfying

 

x V ,
Tx M = Span X 1 (x), X 2 (x), X 1 , X 2 (x)
where (X 1 , X 2 ) is a local frame for in V . Another example of co-rank one fat distributions in odd dimension is given by contact distributions which were introduced
in Example 1.12. In this case property (1.22) is an easy consequence of (1.4). Let us
now prove that fat distributions do not admit non-trivial singular horizontal paths.
By the property of singular concatenated horizontal paths, we just need to show
that non-constant short horizontal paths cannot be singular. Taking a local chart if
necessary we can work in Rn and assume that has a local frame X 1 , . . . , X m . Let
x,T
be a singular control. By Remark 1.8, there
x Rn , T > 0 be fixed and u UF
 
exists an absolutely continuous arc p : [0, T ] Rn \ {0} satisfying (1.17) and
(1.18). For almost every fixed t [0, T ] and every i = 1, . . . , m, derivating (1.18)
yields
m

j=1

m






uj (t) p(t) X j , X i u (t) = p(t)
uj (t)X j , X i u (t) = 0.
j=1

"
Setting the autonomous vector field X() := m
u (t)X j (), we deduce that p(t)



 j=1
 j
i
i
annihilates all the X u (t) s and all the X, X u (t) s. This contradicts (1.22).
Example 1.16 Returning to Example 1.10 (Martinet distribution), we consider in R3
with coordinates x = (x1 , x2 , x3 ), the totally nonholonomic rank two distribution
generated by
x2
X 1 = x1 and X 2 = x2 + 1 x3 .
2
We claim that the singular horizontal curves are exactly the traces of the distribution
on the so-called Martinet surface (see Fig. 1.3)


:= x R3 | x1 = 0 ,
which in other terms means that the singular horizontal paths are either constant
curves or are contained in a line lz of the form


lz = x = (x1 , x2 , x3 ) R3 | x1 = 0 and x3 = z
for some z R.
x,T
be a non-trivial
Let us prove this claim. Let x R3 , T > 0 be fixed and u UF
3
singular control. Denote by x : [0, T ] R the solution to the Cauchy problem

26

1 Sub-Riemannian Structures

Fig. 1.3 The singular horizontal curves are the traces of the distribution on the Martinet surface

x (t) = u1 (t)X 1 (x(t)) + u2 (t)X 2 (x(t)) a.e. t [0, T ], x(0) = x.


As in the previous example,
Proposition 1.11, there exists an absolutely con from

tinuous arc p : [0, T ] R3 \ {0} such that


p (t) = u1 (t) p(t) Dx(t) X 1 u2 (t) p(t) Dx(t) X 2

(1.23)

for a.e. t [0, T ] and


p(t) X 1 (x(t)) = p(t) X 2 (x(t)) = 0

t [0, T ].

(1.24)

We deduce that

2
|u(t)|2 p(t) [X 1 , X 2 ](x(t)) = 0

a.e. t [0, T ].

Since the three vectors X 1 (x), X 2 (x), [X 1 , X 2 ](x) span R3 for every x with x1 = 0,
this shows that x1 (t) = 0 for all t [0, T ], which in turn implies that u1 0. We
deduce that x has the form


# t
u2 (s)ds, 0, x3 (0) ,
x(t) = 0, x2 (0) +
0
x,T
which shows that it is contained in lx3 (0) . Conversely, if an horizontal path x
has the form

x(t) = (0, x2 (t), z)

t [0, T ]

with z R, then any absolutely continuous arc p : [0, T ] R3 \ {0} of the form
p(t) = (0, 0, p3 )

t [0, T ]

1.3 Regular and Singular Horizontal Paths

27

with p3 = 0 satisfies (1.23) and (1.24). This shows that any horizontal path which is
contained in a line lz for some z R is singular.
Example 1.17 More generally, consider a totally nonholonomic distribution of
rank two in a manifold M of dimension three. We define the Martinet surface of
as the set defined by


:= x M | (x) + [, ](x) = Tx M ,
where





, (x) := [X, Y ](x) | X, Y sections of .

In other terms, a point x M belongs to if and only if is not a contact


distribution at x, that is if for any (or for only one) local frame {X 1 , X 2 } in a neighborhood of x the three vectors X 1 (x), X 2 (x), [X 1 , X 2 ](x) do not span Tx M. The
singular paths with respect to are exactly the horizontal paths which are contained
in . Let us prove this claim. The fact that singular curves are necessary included
in follows by the same argument an in Example 1.14. Let us now prove that any
horizontal path which is included in is singular. Let : [0, T ] M such a path
be fixed, set (0) = x, and consider a local frame {X 1 , X 2 } for in a neighborhood
V of x. Let > 0 be small enough so that (t) V for any t [0, ], in such a way
that there is u L 2 ([0, ]; R2 ) satisfying
(t) = u1 (t)X 1 ( (t)) + u2 (t)X 2 ( (t))

a.e. t [0, ].

Taking a change of coordinates if necessary, we can assume that we work in R3 . Let


p0 (R3 ) \ {0} be such that p0 X1 (x) = p0 X2 (x) = 0, and let p : [0, ] (R3 )
be the solution to the Cauchy problem
p (t) =

ui (t) p(t) D (t) X i

a.e. t [0, ], p(0) = p0 .

i=1,2

Define two absolutely continuous function h1 , h2 : [0, ] R by


hi (t) = p(t) X i ( (t))

t [0, ], i = 1, 2.

As above, for every t [0, ] we have





d 
p(t) X 1 ( (t)) = u2 (t) p(t) [X 1 , X 2 ] (t)
h 1 (t) =
dt
and


h 2 (t) = u1 (t) p(t) X 1 , X 2 ( (t)).

28

1 Sub-Riemannian Structures

But since (t) for every t, there are two continuous functions 1 , 2 : [0, ]
R such that



X 1 , X 2 ( (t)) = 1 (t)X 1 ( (t)) + 2 (t)X 2 ( (t))

t [0, ].

This implies that the pair (h1 , h2 ) is a solution of the linear differential system

h 1 (t) = u2 (t)1 (t)h1 (t) u2 (t)2 (t)h2 (t)


h (t) = u (t) (t)h (t) + u (t) (t)h (t).
2
1
1
1
1
2
2
Since h1 (0) = h2 (0) = 0 by construction, we deduce by the Cauchy-Lipschitz
Theorem that h1 (t) = h2 (t) = 0 for any t [0, ].In that way, we have constructed

an absolutely continuous arc p : [0, ] R3 \ {0} satisfying (1.17)(1.18)


(with u = ). We can repeat this construction on a new interval of the form [, 2]
(with initial condition p()) and finally obtain an absolutely continuous arc satisfying
(1.17)(1.18) on [0, T ]. By Proposition 1.11, we conclude that is singular.
Example 1.18 Consider in R4 the two smooth vector fields X 1 , X 2 given by
X 1 = x1 , X 2 = x2 + x1 x3 + x3 x4 .
These two vector fields are always linearly independent in R4 . Moreover we have
[X 1 , X 2 ] = x3 ,



X 2 , [X 1 , X 2 ] = x4 .

Therefore the family F = {X 1 , X 2 } spans a totally nonholonomic distribution


of rank two in R4 . Let us look at singular horizontal paths of or equivalently at
x,T
with x R4 and T > 0.
singular controls with respect to End-Point mapping EF
x,T
Let u UF
be a control satisfying |u(t)| = 1 for a.e. t [0, T ]. This control is
 
singular if and only if there is an arc p = (p1 , p2 , p3 , p4 ) : [0, T ] R4 \ {0}
which satisfies (1.17) and (1.18). Denoting by x = (x1 , x2 , x3 , x4 ) : [0, T ] R4 the
trajectory uniquely associated to x and u, (1.17) yields

x 1 (t) =

x 2 (t) =
x 3 (t) =

x 4 (t) =

u1 (t)
u2 (t)
u2 (t)x1 (t)
u2 (t)x3 (t),

p 1 (t) = u2 (t)p3 (t)

p 2 (t) = 0
p 3 (t) = u2 (t)p4 (t)

p 4 (t) = 0,

for a.e. t [0, T ], while (1.18) yields


p1 (t) = p2 (t) + x1 (t)p3 (t) + x3 (t)p4 (t) = 0

t [0, T ].

(1.25)

1.3 Regular and Singular Horizontal Paths

29

System (1.25) implies that p2 and p4 are constant on [0, T ]. If p4 = 0, then (1.25)
also implies that p3 is constant on [0, T ]. Hence we obtain that p2 + x1 (t)p3 = 0 for
every t [0, T ]. Which means that either x1 is constant or p2 = p3 = 0. Since p
does not vanish on [0, T ], we deduce that x1 is constant, which means that u1 0.
But u2 (t)p3 = 0 for almost every t, hence p3 = 0 (remember that |u(t)| = 1 a.e.
t [0, T ]). We obtain a contradiction. Therefore, p4 = 0, hence we deduce easily that


p 3 (t)
p3 (t) = 0
a.e. t [0, T ].
0 = u2 (t)p3 (t) =
p4
Since p3 is absolutely continuous, this means that it is constant on [0, T ]. This implies
that u2 (t) = 0 for all t [0, T ]. Then, the curve x has the form


x(t) = x1 (t), x2 (0), x3 (0), x4 (0)

t [0, T ].

In conclusion, a singular curve passes through each point in R4 .


Example 1.19 The previous phenomena happens for more general rank two distributions in dimension four. Let be a rank two distribution on a four-dimensional
manifold M such that for every x M, there holds
(x) + [, ](x) has dimension three
and


Tx M = (x) + [, ](x) + , [, ] (x)

x M,

where





, [, ] (x) := X, [Y , Z] (x) | X, Y , Z sections of .
As above, we can work locally, so let us consider a frame {X 1 , X 2 } and a trajectory
2
2
x : [0, T ] R4 associated to some control
 4 u L ([0, T ]; R ). If x is singular
\ {0} satisfying (1.17) and (1.18).
(with respect to ), there is p : [0, T ] R
Derivativing (1.18) two times yields for almost every t [0, T ] with u(t) = 0,



p(t) X 1 , X 2 x(t) = 0

and



 

 



u1 (t) p(t) X 1 , X 1 , X 2 x(t) + u2 (t) p(t) X 2 , X 1 , X 2 x(t) = 0.

(1.26)

(1.27)

Since M has dimension four and + [, ]] has dimension three, there is (locally)
a smooth non-vanishing 1-form whose kernel is equal to + [, ]. Then, by
(1.18) and (1.26) and p(t) are colinear along x, and in turn by (1.27) we have for
almost every t [0, T ] with u(t) = 0,

30

1 Sub-Riemannian Structures



 

 



u1 (t) x(t) X 1 , X 1 , X 2 x(t) + u2 (t) x(t) X 2 , X 1 , X 2 x(t) = 0.
By the above assumptions, for every x, the linear form






 
 
(1 , 2 ) R2 x X 1 , X 1 , X 2 (x) 1 + x X 2 , X 1 , X 2 (x) 2
has a kernel of dimension one. This shows that there is a smooth line field (a distribution of rank one) L on M such that the singular curves are exactly the integral
curves of L.

1.4 The Chow-Rashevsky Theorem


Openness of End-Point mappings. The following result will imply easily the
Chow-Rashevsky Theorem. We recall that a map is said to be open if the image
of any open set is open.


Proposition 1.12 Let F = X 1 , . . . , X k be a family of smooth vector fields on M
satisfying the Hrmander condition on M. Then for every x M and every T > 0,
x,T
x,T
: UF
M is open.
the End-Point mapping EF
Proof Let x M and T > 0 be fixed. Set for every > 0,


x,
d() = max rankx,
(u)
|
u

U
s.t.
u
2 < .
L
F
F
By Proposition 1.9 (iii), the function (0, +) d() is nondecreasing with
values in N. So, there is 0 and d0 N such that d() = d0 for any (0, 0 ).
Since F satisfies the Hrmander condition at x, the vector space spanned by
{X 1 (x), . . . , X k (x)} has dimension 1. Then, thanks to Proposition 1.10, there holds
d() = d0 1

[0, 0 ].

x,

such that u L2 < and rankx,


Let (0, 0 ) and u UF
F (u ) = d0 be fixed.
1
d
2
k
0
There are d0 controls v , . . . , v in L ([0, ]; R ) such that the linear map

L : Rd0 Tx M



x, "d0
j vj
= 1 , . . . , d0 Du EF

j=1


is injective. By construction and the fact that the mapping u rankx,


F (u) is lower
) = d as soon as u
(u
semicontinuous, the rank of any control u is equal to rankx,
0
F
is close enough to u in L 2 ([0, ]; Rk ). Hence, there is an open neighborhood V of
0 Rd0 where the mapping

1.4 The Chow-Rashevsky Theorem

31

E : V M


" 0 j j
x,
EF
u + dj=1
v

is an embedding whose image is a submanifold N of class C 1 in M of dimension d0 .


Moreover by construction again, there holds for every small Rd0 ,

u +
Imx,
F

d0




j vj = D E Rd0 = TE () N.

j=1

By Proposition 1.10, we infer that X i (y) belongs to Ty N for any i = 1, . . . , k and


y N.
Lemma 1.13 Let be an open subset of Rl (l 2) and S be a submanifold of
of class C 1 . Let X, Y be two smooth vector fields on such that
X(x), Y (x) Tx S

x S .

Then [X, Y ](x) Tx S for any x S .


Proof (Proof of Lemma 1.13) As in Proposition 1.5, we denote respectively by etX
and etY the flows of X and Y . Since by assumption X and Y is always tangent to S ,
etX (x) and etY (x) belong to S for any x S and any t small. Therefore


etY etX etY etX (x) S

x S and t small.

By Proposition 1.5, we infer that [X, Y ](x) Tx S for any x S .

From the above lemma it follows that all the brackets involving X 1 , . . . , X k at
y N belong to Ty N. Since F satisfies the Hrmander condition, this shows that
d0 = n and indeed that d() = n for any > 0.
x,T
, v O and > 0 to be chosen later. Since
Let O be an open subset of UF
x,
d() = n, there is u UF such that uL2 < and rankx,
F (u) = n. Define the
control v L 2 ([0, T ]; Rk ) by
v = u u v

T
T 2

The trajectory associated with v is the concatenation of the curve xu starting at x and
associated with u, xu starting at xu () and associated with u , and a reparametrization
of xv starting at x and associated with v (see Fig. 1.4).
x,T
, is regular and satisfies
By construction (see Proposition 1.9), v belongs to UF
x,T
x,T
EF
(v) = EF
(v). Then, as above, the image of a small ball centered at the origin
x,T
in L 2 ([0, T ]; Rk ) by the mapping w EF
(v + w) is an open neighboorhood of
-2
x,T
x,T
E (v) = E (v). Furthermore, the quantity -v v- 2 is equal to
F

32

1 Sub-Riemannian Structures

Fig. 1.4 The trajectory associated with v

#
|u(t) v(t)|2 dt +

|u(2 t) v(t)|2 dt

,
,2
%

$ 
,
,
T t 2
T
,
,
v
v(t), dt
,
, T 2
,
T 2

and consequently bounded by


#
2
0

|u(t)|2 dt +

# 2
0

|v(t)|2 dt 2

#
0

u(t), v(t)
dt + 2

# 2

u(2 t), v(t)


dt

,2
%
,2

2 # T ,, $ 

# T ,
,
,
,
T t 2
T
T
,
,
,
,
+
u(t), dt
,u
, T 2 u(t) u(t), dt + T 2
,
T 2
2
2 ,
0
%

# T / $ 



T t 2
T
T
u(t) u(t) dt.
u
+2
u(t),
T 2
T 2
T 2
2

Then since uL2 < and both functions



t [2, T ]

T
u(t) u(t)
T 2

and

t [2, T ]

$ 
%
T t 2
u
u(t)
T 2

tend to zero in L 2 , we infer that v belong to O if is small enough. This shows that
x,T
x,T
x,T
(O) contains a neighborhood of EF
(v) = EF
(v).


EF
Statement and proof. The aim of the present section is to prove the following result.
Theorem 1.14 (Chow-Rashevskys Theorem) Let be a totally nonholonomic
distribution on M (assumed to be connected). Then, for every x, y M and every
x,T
T > 0, there is an horizontal path
such that (T ) = y.
Thanks to the above discussion, the Chow-Rashevsky Theorem will be a straightforward consequence of the following result.


Theorem 1.15 Let F = X 1 , . . . , X k be a family of smooth vector fields on M.
Assume that M is connected and that F satisfies the Hrmander condition on M.
x,T
such that the
Then for every x, y M and every T > 0, there is a control u UF
solution of (1.6) satisfies u (T ) = y.

1.4 The Chow-Rashevsky Theorem

33

Fig. 1.5 Proof of the Chow-Rashevsky Theorem

Proof Let x and T > 0 be fixed. Denote by AF (x, T ) the set of points in M which
x,T
, that is
can be joined from x by a control in UF


x,T
x,T
UF
.
AF (x, T ) = EF
By Proposition 1.12, AF (x, T ) is an open set in M. Let us show that this set is
closed as well. Let {zk }k be a sequence of points in M converging to some
 z M. By
z,1
z,1
z,1
z,1
is an
openness of the mapping EF and the fact that EF (0) = z, the set EF UF
neighborhood of z. Then, there is k large enough such that zk belongs to that set.
The concatenation of uk together with u steers x to z (see Fig. 1.5). This shows
that AF (x, T ) is closed in M. In conclusion AF (x, T ) is open, closed and nonempty


(it contains x). By connectedness of M, we infer that AF (x, T ) = M.
Remark 1.10 The Chow-Rashevsky may be of course obtained in different ways. For
instance, consider in R3 a totally nonholonomic rank two distribution generated
by two smooth vector fields X 1 , X 2 such that

 

Span X 1 (x), X 2 (x), X 1 , X 2 (x) = R3

x R3 .

Let x R3 and > 0 be fixed, define the function : R3 R3 by



 1


2
1
2
1
t1 , t2 , t3 := eX et3 X eX et2 X et1 X (x),
for every (t1 , t2 , t3 ) R3 . It can be shown that is a local diffeomorphism in
a neighborhood of the origin provided is small enough. This implies easily the
Chow-Rashevsky Theorem for contact distributions in dimension three.

1.5 Sub-Riemannian Structures


Definition. A sub-Riemannian structure on M is given by a pair (, g) where
is a totally nonholonomic distribution on M and g is a smooth Riemannian metric
on , that is for every x M, g(, ) is a scalar product on x . A simple way to
construct a sub-Riemannian structure is to take a smooth connected Riemannian
manifold (M, g), to consider a totally nonholonomic distribution on M, and to
take as sub-Riemannian metric the restriction of g to the distribution. In fact, any
sub-Riemannian structure can be obtained in this way.

34

1 Sub-Riemannian Structures

Example 1.20 The space R3 (with coordinates (x, y, z)) equipped with the rank
two distribution given in Example 1.2 and with the metric g = dx 2 + dy2 is
the most simple sub-Riemannian structure we can imagine. The length of a vector
v = (v1 , v2 , v3 ) (x, y, z) is given by
|v|g =

v12 + v22 .

Since v is an horizontal vector, the latter quantity does not vanishes unless v = 0.
1
m
If the distribution
admits
 a frame X , . . . , X on an open set O M, then the

family F = X 1 , . . . , X m is called an orthonormal dgenerating family of vector
fields or an orthonormal frame for (, g) in O if there holds



gx X i (x), X j (x) = ij

i, j = 1, . . . , m, x O,

where ij denotes the Kronecker symbol (that is ij = 1 if i = j and ij = 0 if i = j).


Sub-Riemannian structures admit local orthonormal frames in a neighborhood of
each point of M.
g

The sub-Riemannian distance. From now on, for every x M we denote by | |x


the sub-Riemannian norm on (x), that is
|v|gx =

gx (v, v)

v (x).

x,T
is defined by
The length of an horizontal path

#
lengthg ( ) :=
0

,
, (t),g dt.
(t)

T,

Note that since any horizontal path is absolutely continuous with square integrable
derivative, the length of any horizontal path is finite.
Let (, g) be a sub-Riemannian structure on M, by the Chow-Rashevsky Theorem, for every x, y M, there is at least one horizontal path joining x to y in time
1. For every x, y M, the sub-Riemannian distance between x and y, denoted by
dSR (x, y), is defined as the infimum of lengths of horizontal paths joining x to y, that
is,


x,1
dSR (x, y) := inf lengthg ( ) |
s.t. (1) = y .
The function dSR defines a distance on M M (the triangular inequality is easy, the
fact that dSR (x, y) x = y follows from the proof of Proposition 1.11) and makes
M a metric space. Given x M and r 0, we call sub-Riemannian ball centered at
x with radius r the set defined as


BSR (x, r) = y M | dSR (x, y) < r .

1.5 Sub-Riemannian Structures

35

The openness of End-Point mappings associated with totally nonholonomic distribution yields the following result.
Proposition 1.16 Let (, g) be a sub-Riemannian structure on M, then the topology
defined by dSR coincides with the original topology of M. In particular, the subRiemannian distance dSR is continuous on M M.
Proof We need to show that for every x M, the family of sub-Riemannian balls
{BSR (x, r)}r>0 is a basis of neighborhoods for x with respect to the original topology.
Let F = {X 1 , . . . , X m } be an orthonormal frame for on an open neighborhood
Vx of some x M Let V Vx be an open and relatively compact neighborhood of
x with respect to the initial topology. Let us show that there is r > 0 small enough
such that BSR (x, r) V . Let W be an open neighborhood of x such that W V .
Define the compact annulus A by
A =V \W.
Any continuous path joining x to a point outside V has to cross A . Hence since
X 1 , . . . , X m are bounded on A , there is > 0 such that any solution u : [0, 1] M
to the Cauchy problem
u (t) =

m


ui (t)X i (u (t))

a.e. t [0, 1], u (0) = x

i=1
x,1
and u (1)
/ V satisfies
with u UF

#
0

,
,
,
,

m
1 ,
i=1

,g
,
,
ui (t)X i (u (t)),
,

dt > .

u (t)

By Proposition 1.7, this means that the sub-Riemannian ball BSR (x, /2) is included in
V . Let us now show that any sub-Riemannian ball BSR (x, r) contains an open neigh-
x,1
borhood of x with respect to the initial topology. The set UF
is open in L 2 [0, 1]; Rm
and contains the control u 0. Thus there is > 0 such that the L 2 -ball BL2 (0, ) is
x,1
. Moreover, since F is orthonormal with respect to g, there holds
contained in UF
for every u BL2 (0, )
# 1
|u(t)|dt.
lengthg (u ) =
0

Thanks to the Cauchy-Schwarz inequality, we infer that



x,1 
BL2 (0, ) BSR (x, ).
EF
x,1
Proposition 1.12 together with EF
= x concludes the proof.

36

1 Sub-Riemannian Structures

1.6 Notes and Comments


Proposition 1.3 is taken from a paper by Sussmann [11]. The Hrmander condition
introduced in Sect. 1.1 is also refered as bracket generating condition. The term comes
from the analysis literature; it is named after Hrmander who obtained hypoellipticity
results for linear operators associated with families of vector fields [6]. Several other
terms may be used to refer to totally nonholonomic distributions. They are called
bracket generating by Montgomery [8], nonholonomic by Bellaiche [2], and they
refer to completely nonholonomic families of vector fields by Agrachev and Sachkov
[1].
The notion of singular curves play a major role in this monograph. Most of the
examples of singular horizontal paths given in Sect. 1.3 are classical. The most valuable (Example 1.19) is taken from [10].
Theorem 1.14 has been proved independently by Chow [4] and Rashevsky [9] in
the 1930s, see [4, 9]. The proof that we present here is an adaptation of the one given
by Bellaiche [2] to prove the so-called Orbit Theorem (see also [1, 7]). Other proofs
of the Chow-Rashevsky Theorem can be found in the texts of Bismut [3], Gromov
[5], or Montgomery [8].

References
1. Agrachev, A.A., Sachkov, Yu.L.: Control Theory from the Geometric Viewpoint. Encyclopaedia of Mathematical Sciences, vol. 87. Springer-Verlag, Heidelberg (2004)
2. Bellaiche, A.: The tangent space in sub-Riemannian geometry. In: Sub-Riemannian Geometry,
pp. 178. Birkhuser, Basel (1996)
3. Bismut, J.-M.: Large Deviations and the Malliavin Calculus. Progress in Mathematics, vol. 45.
Birkhuser, Boston (1984)
4. Chow, C.-L.: ber systeme von linearen partiellen differentialgleichungen ester ordnung. Math.
Ann. 117, 98105 (1939)
5. Gromov, M.: Carnot-Carathodory spaces seen from within. In: Sub-Riemannian Geometry,
pp. 79323. Birkhuser, Basel (1996)
6. Hrmander, L.: Hypoelliptic second order differential operators. Acta. Math. 119, 147171
(1967)
7. Jurdjevic, V.: Geometric Control Theory. Cambridge Studies in Advanced Mathematics, vol.
52. Cambridge University Press, Cambridge (1997)
8. Montgomery, R.: A tour of subriemannian geometries, their geodesics and applications. In:
Mathematical Surveys and Monographs, vol. 91. American Mathematical Society, Providence,
RI (2002)
9. Rashevsky, P.K.: About connecting two points of a completely nonholonomic space by admissible curve. Uch. Zapiski Ped. Inst. Libknechta 2, 8394 (1938)
10. Sussmann, H.J.: A cornucopia of four-dimensional abnormal sub-Riemannian minimizers. In:
Sub-Riemannian Geometry, pp. 341364. Birkhuser, Basel (1996)
11. Sussmann, H.J.: Smooth distributions are globally finitely spanned. In: Analysis and Design
of Nonlinear Control Systems, pp. 38. Springer-Verlag, Heidelberg (2008)

Chapter 2

Sub-Riemannian Geodesics

Throughout all the chapter, M denotes a smooth connected manifold without boundary of dimension n 2 equipped with a sub-Riemannian structure (, g) of rank
m n.

2.1 Minimizing Horizontal Paths and Geodesics


Definition. Given x, y M, we call minimizing horizontal path between x and y any
x,T
with T 0 such that
path
dSR (x, y) = lengthg ( ).
Like in the Riemannian case, minimizing paths with constant speed minimize the
so-called sub-Riemannian energy. Given x, y M, we define the sub-Riemannian
energy between x and y by


x,1
s.t. (1) = y ,
eSR (x, y) := inf energyg ( ) |
x,1
where the energy of a path
is defined as


energy ( ) :=
g

1 

 2
 (t)g
dt.
(t)

The following result whose proof is based on Cauchy-Schwarzs inequality, is


fundamental.
Proposition 2.1 For any x, y M, eSR (x, y) = dSR (x, y)2 .

L. Rifford, Sub-Riemannian Geometry and Optimal Transport,


SpringerBriefs in Mathematics, DOI: 10.1007/978-3-319-04804-8_2,
The Author(s) 2014

37

38

2 Sub-Riemannian Geodesics

Proof Let x, y M be fixed. First, we observe that, for every horizontal path :
[0, 1] M satisfying (0) = x and (1) = y, the Cauchy-Schwarz inequality yields


1
0

| (t)| (t) dt

1

| (t)| (t)

2

dt.

(2.1)

x,1
such that (1) = y yields dSR (x, y)2
Taking the infimum over the set of
x,1
eSR (x, y). On the other hand, for every > 0, there exists an horizontal path
,
with (1) = y, such that

length ( ) =
g

| (t)| (t) dt dSR (x, y) + .

x,1
with (1) = y
Reparametrizing by arc-length, we get a new path
satisfying



 (t)g

(t)

= lengthg ( )

a.e. t [0, 1].

Consequently,


1 

eSR (x, y)
0

 2
 (t)g
dt = lengthg ( )2 (dSR (x, y) + )2 .
(t)

Letting tend to 0 completes the proof of the result.

x,1
Given x, y M, we call minimizing geodesic between x and y any path
joining x to y such that
eSR (x, y) = energyg ( ).

Thanks to the above proof and the fact that equality holds in the Cauchy-Schwarz
g
inequality (2.1) if and only has constant speed (that is | (t)| (t) is constant), we
obtain the following result.
x,1
Proposition 2.2 Given x, y M, a path
is a minimizing geodesic between
x and y if and only if it is a minimizing horizontal path between x and y with constant
speed.

Sufficiently near points can be joined by minimizing geodesics and a fortiori by


minimizing horizontal paths.
Proposition 2.3 Let x M, then there is > 0 such that the following property is
satisfied:
For every y, z BSR (x, ) and any minimizing sequence { k }k : [0, 1] M of
horizontal paths with constant speed such that
lim k (0) = y,

k+

lim k (1) = z,

k+


lim lengthg k = dSR (y, z), (2.2)

k+

2.1 Minimizing Horizontal Paths and Geodesics

39

up to taking a subsequence, { k }k converges uniformly to some minimizing geodesic


y,1
joining y to z.
In particular, for every y, z BSR (x, ), there is a minimizing geodesic between
y and z.
Proof Fix x M and F = {X 1 , . . . , X m } an orthonormal frame for on an open
and relatively compact neighborhood Vx of x. From Proposition 1.13, there is r > 0
small enough such that BSR (x, r) Vx . For any y BSR (x, r/4) and any horizontal
path : [0, 1] M with constant speed satisfying

r
2r
and lengthg ( ) ,
dSR (0), y <
24
3
we have for every t [0, 1],

dSR x, (t) dSR (x, y) + dSR y, (t)

r/4 + dSR y, (0) + lengthg ( )


r/4 + r/24 + 2r/3 = 23r/24 < r.
Which means that
is contained in BSR (x, r). Furthermore, for every such horizontal
path, there is u L 2 [0, 1]; Rm such that
(t) =

ui (t)X i (t)

and

| (t)| (t) = uL2 = lengthg ( ),

i=1

for a.e. t [0, 1]. Let y, z BSR (x, r/4) be fixed and { k }k : [0, 1] M be
a sequence of horizontal paths with constant speed verifying (2.2). By the above
discussion, we may assume without loss of generality that all the paths k : [0, 1]
M are valued in the compact set Vx with

derivatives
bounded by r and associated
with a sequence of controls {uk }k in L 2 [0, 1]; Rm such that uk L2 = lengthg ( k ).
Then by Arzela-Ascolis theorem taking a subsequence if necessary the sequence
k
{ k }k converges to some : [0, 1] M. Moreover, the sequence {u

}k is bounded

2
2
in L so it weakly converges up to a subsequence to some v L [0, 1]; Rm . We
obtain easily that (0) = y, (1) = z,
(t) =

v i (t)X i (t)

a.e. t [0, 1],

i=1

and by lower semicontinuity of the L 2 -norm under weak convergence we immediately


deduce that


vL2 lim uk 2 = dSR (y, z).
k+

40

2 Sub-Riemannian Geodesics

Furthermore, since is an horizontal path joining y to z, there holds


dSR (y, z) lengthg ( ).
By Cauchy-Schwarzs inequality, we have lengthg ( ) vL2 . Then we infer that


energyg = v2L2 = dSR (y, z)2 = eSR (y, z).
Which shows that is a minimizing geodesic joining y to z.

Remark 2.1 The above proof shows indeed that up to taking a subsequence, the
sequence {uk }k converges strongly to v in L 2 ([0, 1]; Rm ). As a matter of fact, it
converges weakly to v and satisfies


lim uk

k+

L2

= vL2 .

The SR Hopf-Rinow Theorem. The following sub-Riemannian version of the classical Riemannian Hopf-Rinow Theorem holds.
Theorem 2.4 (Hopf-Rinow Theorem) Let (, g) be a sub-Riemannian structure
on M. Assume that (M, dSR ) is a complete metric space. Then the following properties
hold:
(i) The balls B SR (x, r) are compact (fort any r 0).
(ii) For every x, y M there exists at least one minimizing geodesic joining x to y.
Proof Let us first recall that thanks to Proposition 1.13, the metric space (M, dSR ) is
locally compact. That is for every x M, there is r > 0 such that the ball B SR (x, r) is
compact. Let x M be fixed. We first show that all the balls B SR (x, r) with r 0 are
compact. Denote by Ix the set of r 0 such that B SR (x, r) is compact. By inclusion
of the balls B SR (x, r ) B SR (x, r) if r r and local compactness of (M, dSR ), Ix
is an interval whose supremum Rx is strictly positive. We claim that I is both closed
and open in [0, +).
Lemma 2.5 The interval Ix is closed in [0, +).
Proof (Proof of Lemma 2.5) We need to show that Rx belongs to Ix , that is that
B SR (x, Rx ) is compact. Let {yk }k be a sequence of points in B SR (x, Rx ), we need to
show that it has a convergent subsequence. We construct a Cauchy subsequence of
{yk }k as follows. For every integer l 1, we set



K l = B SR x, Rx 1 2l .
By assumption, {K l }l is an increasing sequence of compact sets in B SR (x, Rx ). For
every k N, there is yk1 K 1 such that

2.1 Minimizing Horizontal Paths and Geodesics

41




 R
x
.
dSR yk , yk1 = inf dSR (yk , z) | z K 1
2
By compactness of K 1 , there is a strictly increasing mapping 1 : N N such that
the sequence {y1 1 (k) }k converges to some y 1 K 1 . Thus there exists k1 0 such that

 R
x
dSR y1 1 (k) , y 1
2

k k1 .

Set z1 := y 1 (k1 ) . Now for every k N, there is yk2 K 2 such that





 R
x
.
dSR y 1 (k) , yk2 = inf dSR (y 1 (k) , z) | z K 2
4
Again, by compactness of K 2 there exists a strictly increasing mapping 2 : N N
such that the sequence {y2 2 (k) }k converges to some y 2 K 2 and then there is k2 k1
such that

 R
x
dSR y2 2 (k) , y 2
4

k k2 .

Set z2 := y( 1 2 )(k2 ) . By construction, there holds






dSR (z1 , z2 ) dSR z1 , y1 1 (k ) + dSR y1 1 (k ) , z2
1
1




1
= dSR y 1 (k1 ) , y 1 (k ) + dSR y1 1 (k ) , y( 1 2 )(k2 )
1
1






Rx
1
1
1 1
1
+ dSR y 1 (k ) , y + dSR y , y( 1 2 )(k ) + dSR y(

1 2 )(k ) , z2
1
2
2
2
Rx
Rx
Rx
Rx
+
+
+
2Rx .

2
2
2
2

Repeating this construction yields a sequence of strictly increasing mappings { l }l ,


a sequence (with two indices) {ykl }k,l , a sequence of limits {yl }l , and a nondecreasing
sequence of integers {kl }l such that



 R
x
dSR yk , ykl = inf dSR (yk , z) | z K l l
2
and

 R
x
dSR yl l (k) , y l l
2

k kl .

Define the sequence {zl }l by


zl := y( 1 2 l )(kl )

l.

42

2 Sub-Riemannian Geodesics

Then proceeding as above shows that for every l 1, one has

4Rx
dSR zl , zl+1 l .
2
Hence {zk }k is a Cauchy sequence in B SR (x, Rx ). Since (M, dSR ) is complete, it

converges to some z B SR (x, Rx ).
Lemma 2.6 The interval Ix is open in [0, +).
Proof (Proof of Lemma 2.6) We need to show that if R Ix , then there is > 0 such
that R + belongs to Ix . Let R > 0 in Ix be fixed. Denote by BSR (x, R) the boundary
of B SR (x, R), that is BSR (x, R) = B SR (x, R) \ BSR (x, R). Since B SR (x, R) is assumed
to be compact, its boundary is compact too. From Proposition 2.3, we know that for
every y BSR (x, R), there is y > 0 such that B SR (y, 2y ) is compact. Since

BSR (x, R) yBSR (x,R) BSR y, y ,


there is a finite number of points y1 , . . . , yN in BSR (x, R) such that

BSR (x, R) N
i=1 BSR yi , yi .
Set

yi
| i = 1, . . . , N .
= min
2


We prove easily that




B SR (x, R + ) B SR (x, R) N
i=1 BSR yi , 2yi
which is a finite union of compact sets, hence compact as well. This shows that

B SR (x, R + ) is compact.
In conclusion, Ix is both open and closed in [0, +). Hence Ix = [0, +) which
concludes the proof of (i). Let us now prove assertion (ii). We note that since does
not necessarily admit a global orthonormal frame on M, we cannot repeat verbatim
the proof of Proposition 2.3. Let x, y M be fixed, set R := max{2dSR (x, y), 1}. By
(i), we know that B SR (x, R) is compact. Let { k }k be a sequence of horizontal paths
x,1
joining x to y such that
with constant speed in
dSR (x, y) = lim length( k ).
k+

2.1 Minimizing Horizontal Paths and Geodesics

43

Without loss of generality we may assume that


length( k ) < R

k,

which means that all the curves k remain in B SR (x, R). By Proposition 2.3, for every
z B SR (x, R) there is z > 0 such that any minimizing sequence of horizontal paths
with constant speed contained in BSR (z, z ) converges uniformly (up to taking a
subsequence) to some minimizing geodesic. By compactness, there are z1 , . . . , zL
B SR (x, R) and an integer N > 1 with R/N < min{1 , . . . , L }/4 such that
BSR (x, R)

L


BSR zl , 1/N .

l=1

Set for every j = 0, . . . , N, tj := j/N, for every j = 0, . . . , N 1, Ij := [tj , tj+1 ], and


denote by jk the restriction of k to the interval Ij . Fix j {0, . . . , N 1}. For every
k, there is l {1, . . . , L} (which may depend on k) such that dSR ( k (tj ), zl ) < 1/N,
then

1
R
lengthg ( k )
< + < l ,
dSR k (t), zl dSR ( k (tj ), zl ) +
N
N
N
for every t Ij . This shows that each piece of horizontal path jk with length
lengthg ( k )/N is contained in some BSR (zl , l ). Therefore, up to taking a subsequence, the sequence {jk }k converges to some minimizing geodesic with length
dSR (x, y)/N. We deduce easily the existence of a subsequence of { k }k converging
to some minimizing geodesic between x and y.

Remark 2.2 In fact, we proved a global version of Proposition 2.3. If (M, dSR ) is
a complete metric space, then for every x, y M and every minimizing sequence
x,1
joining x to y such that
{ k }k of horizontal paths with constant speed in
dSR (x, y) = lim length( k ),
k+

up to taking a subsequence, { k }k converges uniformly to some minimizing geodesic


joining x to y.
We shall say that the sub-Riemannian structure (, g) on M is complete if the
metric space (M, dSR ) is complete. The following result holds.
Proposition 2.7 Let (, g) be a sub-Riemannian structure on M, assume that (M, g)
is a complete Riemannian manifold. Then for any totally nonholonomic distribution
, the SR structure (, g) on M is complete.

44

2 Sub-Riemannian Geodesics

Fig. 2.1 An orthonormal frame along

Proof Denote by dg the Riemannian geodesic distance on M with respect to g. Since


the set of paths joining x to y contains the set of horizontal paths joining x to y, there
holds
dg (x, y) dSR (x, y)

x, y M.

Therefore, any Cauchy sequence with respect to dSR is a Cauchy sequence with
respect to dg . Hence it is convergent. Since both topology coincide, it is convergent

with respect to dSR as well.

2.2 The Hamiltonian Geodesic Equation


Throughout all the section, we assume that the SR structure (, g) is complete.
Thanks to Theorem 2.4, minimizing geodesics exist between any pair of points in M.
Normal and abnormal geodesics. Let x, y M and a minimizing geodesic
x,1

joining x to y be fixed. Since minimizes the distance between x and y it cannot


have self-intersection. Hence (, g) admits an orthonormal frame along (Fig. 2.1).
Then there is an open neighborhood V of ([0, 1]) in M and an orthonormal family
F (with respect to the metric g) of m smooth vector fields X 1 , . . . , X m such that


z V .
(z) = Span X 1 (z), . . . , X m (z)
Moreover, there is a control u L 2 ([0, 1]; Rm ) (which indeed belong to the open
set UFx,1 which was defined in Proposition 1.7) such that
(t) =

ui (t)X i ( (t))

a.e. t [0, 1].

i=1

Since is a minimizing geodesic between x and y, it minimizes the energy among


all horizontal paths joining x to y. Since there is a local one-to-one correspondence
between the set of horizontal paths starting at x and the set of trajectories of some
control system (see Proposition 1.7), the control u minimizes the quantity

2.2 The Hamiltonian Geodesic Equation


0

gu (t)


m

ui (t)X i (x,u (t)),

i=1

45



ui (t)X i (u (t)) dt =
0

i=1

m
1

ui (t)2 dt =: C(u),

i=1

among all controls u L 2 ([0, 1]; Rm ) such that the solution u : [0, 1] M of the
Cauchy problem
u (t) =

ui (t)X i (u (t)) a.e. t [0, 1],

u (0) = x,

i=1
x,1
has been defined
is well-defined on [0, 1] and satisfies (the End-Point mapping EF
in Chap. 1)
x,1
(u) = y.
EF

In other terms, there is an open set U L 2 ([0, 1]; Rm ) such that u is solution to
the following optimization problem:
x,1
(u) = 1.
u minimizes C(u) among all u U with EF

By the Lagranges Multipliers Theorem (see Theorem B.2), there is p Ty M (Rn )

and 0 {0, 1} with (0 , p = (0, 0) such that


x,1
p Du EF
(v) = 0 Du C(v)

v L 2 [0, 1]; Rm .

(2.3)

Two cases may appear, either 0 = 0 or 0 = 1. By restricting V if necessary, we


can assume that the cotangent bundle T M is trivializable with coordinates (x, p)
V (Rn ) over V .
First case: 0 = 0.
Then we have p Ty M \ {0} (Rn ) \ {0} satisfying
x,1
p Du EF
(v) = 0

v L 2 [0, 1]; Rm .

x,1
This means that some nonzero linear form annihilates the image of EF
. Then u is
singular with respect to x and F or equivalently the path is singular with respect
to . By Proposition 1.11 and Remark 1.8, admits an abnormal extremal lift, that
is there is an absolutely continuous arc p : [0, 1] (Rn ) \ {0} with p(1) = p which
satisfies

p (t) =

i=1

ui (t) p(t) D (t) X i

a.e. t [0, 1]

46

2 Sub-Riemannian Geodesics

and

p(t) X i (t) = 0, t [0, 1]

i = 1, , m.

In other terms, is a singular minimizing geodesic.


Second case: 0 = 1.
Define in local coordinates, the Hamiltonian H : V (Rn ) R by


m
m
m
2
1

1 2
i
i
H(x, p) :=
p X (x) = max
ui p X (x)
ui
2
2
uRm
i=1

i=1

(2.4)

i=1

for all (x, p) V (Rn ) . Then the following result holds.


Proposition 2.8 Equality (2.3) with 0 = 1 yields the existence of a smooth arc
p : [0, 1] (Rn ) with p(1) = 2p , such that the pair ( , p) satisfies

(t) =

 i
m 
i
i=1 p(t) X ( (t)) X ( (t))
p (t) = H ( (t), p(t)) = m p(t) X i ( (t)) p(t) D X i
(t)
i=1
x
H
p ( (t), p(t))

(2.5)

for a.e. t [0, 1] and

ui (t) = p(t) X i ( (t))

for a.e. t [0, 1], i = 1, . . . , m.

In particular, the path is smooth on [0, 1].


Proof The differential of C : L 2 ([0, 1]; Rm ) R at u is given by
Du C(v) = 2u , vL2

v L 2 ([0, 1]; Rm ).

x,1
Moreover by Remark 1.5, the differential of EF
at u is given by


x,1
Du EF
(v) = S(1)

S(t)1 B(t)v(t)dt

v L 2 ([0, 1]; Rm ),

where the functions A, B, S were defined in Remark 1.5. Hence (2.3) yields


p S(1)S(t)1 B(t) 2u (t) v(t)dt = 0

v L 2 ([0, 1]; Rm ).

Which implies
u (t) =


1
p S(1)S(t)1 B(t)
2

a.e. t [0, 1].

(2.6)

2.2 The Hamiltonian Geodesic Equation

47

Let us define p : [0, 1] (Rn ) by


p(t) :=

1
p S(1)S(t)1
2

t [0, 1].

By construction, for a.e. t [0, 1] we have u (t) = p(t) B(t), which means
that (2.6) is satisfied. Furthermore, as in the proof of Proposition 1.11, we have
p (t) = p(t) A(t) for a.e. t [0, 1]. This means that (2.5) is satisfied for a.e.
t [0, 1]. The pair ( , p) is solution to a smooth autonomous differential equation,
hence it is smooth.

The curve : [0, 1] T M given by (t) = ( (t), p(t)) for every t [0, 1]
is a normal extremal whose projection is and which satisfies (1) = (y, 2p ). We
say that is a normal extremal lift of . We also say that is a normal minimizing
geodesic.
Define the sub-Riemannian Hamiltonian H : T M R as follows. For every
x M, the restriction of H to the fiber Tx M is given by the nonnegative quadratic
form


p(v)2
1
| v (x) \ {0} .
(2.7)
p max
2
gx (v, v)

=
Let H denote the Hamiltonian vector field on T M associated to H, that is,
H
dH, or in local coordinates


H
H

(x, p),
(x, p) .
H (x, p) =
p
x

A normal extremal is an integral curve of H defined on some interval [0, T ], i.e., a

curve : [0, T ] T M such that (t)


= H ((t)), for t [0, T ]. The projection
of a normal extremal : [0, T ] T M is a smooth horizontal path := :
[0, T ] M with constant speed given by


 (t)g

(t)

2H (t)

t [0, T ].

We check easily that the Hamiltonian defined by (2.7) reads as (2.4) in local coordinates. Then the previous study yields the following result.
Theorem 2.9 Let : [0, 1] M be a minimizing geodesic between x and y in M.
One of the two following non-exclusive cases occur:
is singular.
admits a normal extremal lift in T M.
Be careful, a minimizing geodesic could be both singular and the projection of a
normal extremal. In Sect. 2.5, we shall see several examples of minimizing geodesics,

48

2 Sub-Riemannian Geodesics

including the cases of singular normal minimizing geodesics and strictly abnormal
minimizing geodesic, that is abnormal geodesics admitting no normal extremal lift.
Remark 2.3 In the Riemannian case, that is if has rank m = n, any path is horizontal and regular (see Remark 1.9). As a consequence any minimizing geodesic is
normal.
Short normal geodesics are minimizing. Projections of normal extremals are minimizing for short times.
Proposition 2.10 Let x M and p Tx M with H(x , p ) = 0 be fixed. Then there is
a neighborhood W of p in Tx M and > 0 such that every normal extremal so that
(0) = (x , p) (in local coordinates) belongs to W minimizes the SR energy on the
interval [0, ]. That is if we set := : [0, ] M, then we have

eSR (0), () = 2H(x, p)2 .


In particular, minimizes the length between x and ().
Proof Since the result is local, we can assume that we work in Rn . Then we can
assume that (, g) admits an orthonormal frame F = {X 1 , . . . , X m }. For sake of
simplicity, we identify (Rn ) with Rn . Then the Hamiltonian H : Rn Rn R
which were defined in (2.4) and (2.7) is given by

H(x, p) := max p,


uRm

i=1

1 2
ui X i (x)
ui
2
m

i=1

1
p, X i (x)2 ,
=
2
m

i=1

for every (x, p) Rn Rn .


Our aim is now to prove the following result: for every p0 Rn such that
H(x , p0 ) = 0, there exist a neighborhood W of p0 in Rn and > 0 such that
every solution (x, p) : [0, ] Rn Rn of the Hamiltonian system

(t)
=
(x(t),
p(t))
=
p(t), X i (x(t))X i (x(t))

p
i=1
m

p(t), X i (x(t)) Dx(t) X i p(t) ,

p (t) = x (x(t), p(t)) =

(2.8)

i=1

with x(0) = x and p(0) W , satisfies

2H x , p0 =

p(t), X i (x(t))2 dt

i=1

i=1

for every control u L 2 ([0, ]; Rm ) such that the solution of

ui (t)2 dt,

(2.9)

2.2 The Hamiltonian Geodesic Equation

y (t) =

ui (t)X i (y(t)), y(0) = x ,

49

(2.10)

i=1

satisfies y() = x(). Let p0 Rn with H(x , p0 ) = 0 be fixed, we need the following
lemma.
Lemma 2.11 There exist a neighborhood W of p0 and > 0 such that, for every
p W , there exists a function S : B(x , ) R of class C 1 which satisfies
H(x, S(x)) = H(x , p), x B(x , ),

(2.11)

and such that, if (x p , pp ) : [, ] Rn Rn denotes the solution of (2.8) satisfying


x p (0) = x and pp (0) = p, then
S(x p (t)) = pp (t), t (, ).

(2.12)

Proof (Proof of Lemma 2.11) The proof consists in applying the method of characteristics. Let be the linear hyperplane such that p0 , v = 0 for every v . We
first show how to construct locally S as the solution of the Hamilton-Jacobi Equation
(2.11) which vanishes on x + and such that S(x ) = p0 . Up to considering a
smaller neighborhood V , we assume that H(x, p0 ) = 0 for every x V . For every
x (x + ) V , set

H(x , p0 )
p0 .
p (x) :=
H(x, p0 )
Then, H(x, p (x)) = H(x , p0 ) and p (x) , for every x V . There exists > 0
such that, for every x (x + ) V , the solution (xx , px ) of (2.8), satisfying
xx (0) = x and px (0) = p (x), is defined on the interval (, ).
For every x (x + ) V and every t (, ), set (t, x) := xx (t). The
mapping (t, x) 
(t, x) is smooth. Moreover, (0, x) = x for every x (x + )V
p(x), X i (x )X i (x ) does not belong to . Hence there exists
and (0, x ) = m
i=1 
(0, ) with B(x , ) V such that the mapping is a smooth diffeomorphism from
(, ) ((x + ) B(x , )) into a neighborhood V of x . Denote by = (, )
the inverse function of , that is the function such that ( )(x) = ( (x), (x)) = x
for every x V . Define the two vector fields X and P by
X(x) := ( (x), (x)) and P(x) := p(x) ( (x)), x V .
Then,

50

2 Sub-Riemannian Geodesics
m

X( (t, x)) = (t, x) = x x (t) =

px (t), X i (xx (t))X i (xx (t))

i=1
m

P( (t, x)), X i ( (t, x))X i ( (t, x)),

i=1

and
m

P( (t, x)), X i (xx (t))2 =

i=1

px (t), X i (xx (t))2 = 2H(x, p (x)) = 2H(x , p0 ),

i=1

for every t (, ) and every x (x + ) B(x , ). For every x V , set


i (x) := P(x), X i (x). Hence,
X(x) =

i (x)X i (x) and

i=1

i (x)2 = 2H(x , p0 ),

i=1

for every x V . Define the function S : V R by


S(x) := 2H(x , p0 ) (x), x V .
We next prove that S(x) = P(x) for every x V . For every t (, ), denote
by Wt := {y V | (y) = t}. In fact, Wt coincides with the set of y V such
that S(y) = 2H(x , p0 )t. It is a smooth hypersurface which satisfies S(y) Ty Wt
for every y Wt . Let y Wt be fixed, there exists x (x + ) B(x , ) such that
y = (t, x) = xx (t). Let us first prove that P(y) = px (t) is orthogonal to Ty Wt . To
this aim, without loss of generality we assume that t > 0. Let w Ty Wt , there exists
v such that w = Dx t (v). For every s [0, t], set z(s) := Dx (s, x)(v). We have
z (s) =

d
d
d
x)v = X( (t, x))v = D(t,x) X(z(s)).
Dx (s, x)v = (t,
ds
dx
dx

Hence,
d
z(s), px (s) = z( s), px (s) + z(s), p x (s)
ds
= D(s,x) Xz(s), px (s)
z(s),

px (s), X i (xx (s)) Dxx (s) X i px (s) .


i=1

Since X(x) =
there holds

m

i=1 i (x)X

i (x)

and

m

i=1 i (x)

= 2H(x , p0 ) for every x V ,

2.2 The Hamiltonian Geodesic Equation

Dxx (s) X

51

m
m


px (s) =
i (xx (s)) Dxx (s) X i px (s) +
X i (xx (s)), px (s)i (xx (s))
i=1

=
=

i=1
m

i=1

i (xx (s)) Dxx (s) X

px (s) +

i (xx (s))i (xx (s))

i=1

i (xx (s)) Dxx (s) X i px (s) .

i=1

We deduce that

d
ds z(s), px (s)

= 0 for every s [0, t]. Hence,

w, P(y) = w, px (t) = z(t), px (t) = z(0), p (x) = 0.


This proves that P(y) is orthogonal to Ty Wt , which implies that P(y) and S(y) are
colinear. Furthermore, since S(xx (s)) = 2H(x , p0 )s for every s [0, t], one gets
S(xx (t)), x x (t) = 2H(x , p0 ) = px (t), x x (t).
Since x x (t) = X(y) does not belong to Ty Wt , we deduce that S(xx (t)) = px (t). In

consequence, we proved that S(x) = P(x) for every x V .
Let us now conclude the proof of Proposition 2.10. Clearly, there exists > 0
such that every solution (x, p) : [0, ] Rn Rn of (2.8), with x(0) = x and
p(0) W , satisfies
x(t) B(x , ), t [0, ].
Moreover, we have by (2.11)(2.12)
S(x()) S(x ) = 2H(x , p).
Let u L 2 ([0, ]; Rm ) be a control such that the solution y : [0, ] W of (2.10)
starting at x satisfies y() = x(). We have
S(x()) S(x ) = S(y()) S(y(0))

d
=
(S(y(t))) dt
dt
0
S(y(t)), y (t)dt
=


1
ui (t)2 dt
2
m

H(y(t), S(y(t))) +

= H(x , p) +

1
2


0

i=1

i=1

ui (t)2 dt.

52

2 Sub-Riemannian Geodesics

Fig. 2.2 The method of characteristics

Inequality (2.9) follows.

2.3 The Sub-Riemannian Exponential Map


Definition. Recall that the SR Hamiltonian H : T M R which is canonically
associated with our SR structure (, g) is defined by


p(v)2
1
| v (x) \ {0}
H(x, p) = max
2
gx (v, v)

(x, p) T M.

We recall that a normal extremal is a curve : [0, T ] T M satisfying

(t)
= H ((t))

t [0, T ].

Let x M be fixed. We first define the domain Ex Tx M of the SR exponential


map by,


Ex := p Tx M | x,p is defined on the interval [0, 1] ,
where x,p is the normal extremal so that x,p (0) = (x, p) in local coordinates.
The set Ex is an open subset of Tx M containing the origin and star-shaped with
respect to 0.
The sub-Riemannian exponential map from x is defined by
expx : Ex Tx M M

p x,p (1) .

2.3 The Sub-Riemannian Exponential Map

53

By rescaling, if (xp , pp ) : [0, T ] T M is the trajectory of the Hamiltonian

vector field H with x(0) = x, p(0) = p, then we have

xp (t), pp (t) = xp (t), pp (t)

t [0, T /], > 0.

Then, for every p Tx M, the curve p : [0, 1] M defined by

p (t) := expx (tp) = x,p (t)

t [0, 1],

is an horizontal path with constant speed satisfying

energyg ((x,p )) = lengthg ((x,p )) = 2H x,p (0) = 2H(x, p).

Proposition 2.13 Assume that (, g) is complete. Then


Ex = Tx M

x M.

Proof We argue by contradiction. Let x M and = ( , pp ) : [0, T ) T M be


a normal extremal starting at (x , p ) Tx M that extends to no interval [0, T + ) for
> 0. Let {tk }k be any increasing sequence that approaches
T , and set yk := (tk ).
Since is an horizontal path with constant speed V = 2H(x , p ), we have



dSR yk , yl V tk tl 

k, l.

Then {yk }k is a Cauchy sequence in M. By completeness {yk }k converges to some


point y M. Let {X 1 , . . . , X m } be a local orthonormal frame in a small ball BSR (y, r).
In local coordinates near y, H reads
2
1

H(x, p) =
p X i (x)
2
m

i=1

and ( , pp ) satisfies the differential system

(t) =

 i

 
i
(t), pp (t) = m
i=1 pp (t) X (t) X (t)
p (t) = H
(t), p (t) = m p (t) X i
(t) p (t) D X i ,
p
p
p
(t)
i=1 p
x
H
p

for t [T , T ) with > 0 small enough. Since H is constant along ( , pp ), we


have




i = 1, . . . , m,
pp (t) X i (t)  mV

54

2 Sub-Riemannian Geodesics

and by compactness, the vector fields X1 , . . . , Xm and their differentials are bounded
in B SR (y, r). Thus there is a constant K > 0 such that




p p (t) K pp (t)

t [T , T ).

By Gronwalls Lemma (see Lemma A.1), we infer that both and pp are uniformly
bounded near T . This means that the extremal can be extended beyong T , which
gives a contradiction.

Remark 2.4 Let (x, p) T M such that p is singular be fixed. By Proposition 1.11,
p is the projection of an abnormal extremal : [0, 1] T M (written as (p , q)
in local coordinates). Then taking local coordinates, for every R the curve (here
x,p = (p , pp ) denotes the normal extremal starting at (x, p))

t [0, 1] x,p (t) + (t) = p (t), pp (t) + q(t)


is a normal extremal starting at (x, p + q). Then we have

expx p + q = expx (p)

R.

Remark 2.5 Let (, g) be a complete sub-Riemannian structure on M and x


M be fixed. If (, g) does not admit singular minimizing curves from x, then the
exponential map from x is onto. As a matter of fact, for every y M, there is a
minimizing geodesic : [0, 1] M joining x to y. Since is not singular, it is
the projection of a normal extremal (see Theorem 2.9), which means that there is
p Tx M such that expx (p) = y.
On the image of the Sub-Riemannian exponential map. The functions expx are
almost onto.
Theorem 2.14 Assume that (, g) is complete and let x M be fixed. There is an
open and dense set D M such that for every y D there is py Tx M satisfying

expx py = y and dSR (x, y) = 2H x, py .


In particular, the set expx (Tx M) contains an open dense subset of M.
Proof Let us begin with a preparatory lemma.
Lemma 2.15 Let y = x in M be such that there is a function : M R differentiable at y such that
2
2
(x, y) and dSR
(x, z) (z) z M.
(y) = dSR

Then there is a unique minimizing geodesic : [0, 1] M between x and y. It is the


projection of a normal extremal : [0, 1] T M satisfying (1) = (y, 21 Dy ). In
particular x = expy ( 21 Dy ).

2.3 The Sub-Riemannian Exponential Map

55

2 (x, z) for any z M, the assumpProof (Proof of Lemma 2.15) Since eSR (x, z) = dSR
tion of the proposition implies that there is a neighborhood U of y in M such that

eSR (x, z) (z) z U

and

eSR (x, y) = (y).

(2.13)

Since (M, dSR ) is complete, there exists a minimizing geodesic : [0, 1] M


between x and y. As before, we can parametrize the distribution by a orthonormal
family F of smooth vector fields X 1 , . . . , X m in a neighborhood V of ([0, 1]), and
we denote by u the control corresponding to . By construction, it minimizes the
quantity


m
1

C(u) =
0

ui (t)2 dt,

i=1

among all the controls u L 2 ([0, 1]; Rm ) which are admissible with respect to x, F
x,1
(u) = y. Let u L 2 ([0, 1]; Rm ) be a
and V and which satisfy the constraint EF
x,1
control admissible with respect to x, F and V such that EF
(u) U . By (2.13)
one has




x,1
x,1
(u) EF
(u) .
C(u) eSR x, EF
Moreover


x,1
(u ) .
C(u ) = eSR (x, y) = (y) = EF
Hence u minimizes the functional D : L 2 ([0, 1]; Rm ) R defined as


x,1
D(u) := C(u) EF
(u) ,
x,1
over the set of controls u L 2 ([0, 1]; Rm ) such that EF
(u) U . This means that

u is a critical point of D. Setting = Dy , we obtain


x,1
Du C = 0.
Du EF

By Proposition 2.8, the path admits a normal extremal lift : [0, 1] T M


satisfying (1) = (y, 21 Dy ). By the Cauchy-Lipschitz Theorem, such a normal
extremal is unique.

Denote by Px the set of points in M such that there is a unique normal minimizing
geodesic y from x to y. The previous lemma yields easily the following result.
Lemma 2.16 The set Px is dense in M.
Proof (Proof of Lemma 2.16) Let y M and r > 0 be fixed. Let : M R be a
smooth function such that

56

2 Sub-Riemannian Geodesics

(y) = 0 and (z) 2r z BSR (y, r).


The continuous function
z B SR (x, r) dSR (x, z) + (z)
is equal to dSR (x, y) at z = y and by the triangle inequality it is larger than dSR (x, y)+r
for z BSR (y, r). Then there is z BSR (y, r) such that


dSR (x, z) dSR x, z + z (z)

z BSR (y, r).




We conclude easily by Lemma 2.15.

For every y Px , denote by rank(y) the rank of the minimizing horizontal path
y (see Sect. 1.3).
Lemma 2.17 The set of y Px with rank(y) = n is dense in M.
Proof (Proof of Lemma 2.17) We argue by contradiction. Assume that there is an
open set O M such that any point y Px O has rank < n. Set


r := max rank(y) | y Px O .
Fix y Px O such that rank(y) = r and set := y . For every y Px O denote
by y the affine subspace of Tx M such that p = y , that is the space of p Tx M
such that

t [0, 1].
p (t) = expx (tp) = x,p (t) = y (t)
Remembering Remark 2.4, we observe that the dimension of y is exactly equal
to n rank(y). As a matter of fact, given y Px O and an orthonormal family
F = {X 1 , . . . , X m } in a neighborhood V along y , remembering the arguments
given in Proposition 2.8 we check that p Tx M belongs to y if and only if
x,p (1) = (y, pp (1)) satisfies
x,1
(v) = Duy C(v)
2pp (1) Duy EF

v L 2 [0, 1]; Rm .

(2.14)

Let {yk }k be a sequence in Px O converging to y , F = {X 1 , . . . , X m } be an


orthonormal family in a neighborhood V along , and u the control associated with
x,1
is valued in Rn ; denote by E1 , . . . , En
through F . The End-Point mapping EF
its n coordinates. The vector space (we identify L 2 ([0, 1]; Rm ) with its dual)


Span Du E1 , . . . , Du En
has dimension rank(y) = r . Let i1 , . . . , ir {1, . . . , n} be such that

2.3 The Sub-Riemannian Exponential Map





Span Du Ei1 , . . . , Du Eir = Span Du E1 , . . . , Du En .

57

(2.15)

Proceeding as in the proof of Proposition 2.3 and using completeness of (, g),


we show that taking a subsequence if necessary, {k := yk }k converges uniformly to some minimizing geodesic joining x to y. By uniqueness, we infer that
limk+ k = . Furthermore, the proof also shows that the controls uk := uk
which are associated to the k s through the orthonormal family F converges strongly
x,1
and the fact that
to u in L 2 ([0, 1]; Rm ) (see Remark 2.1). Then by regularity of EF
rank(yk ) r , we deduce that rank(yk ) = rank(y) for k large enough and that




Span Duk Ei1 , . . . , Duk Eir = Span Duk E1 , . . . , Duk En .
By (2.14) and (2.15), there is = ( 1 , . . . , r ) Ty M (Rr ) such that (remember
that we identify L 2 ([0, 1]; Rm ) with its dual)
r

j Duy Eij = uy ,

j=1

and more generally for every k there is k = (k1 , . . . , kr ) Tyk M (Rr ) such
that
r

kj Duk Eij = uk ,

j=1
x,1
x,1
Since {uk }k converges to u in L 2 ([0, 1]; Rm ), {Duk EF
}k converges to Du EF
and
k

Du Ei1 , . . . , Du Eir are linearly independant, we infer that { }k tends to as k tends


to +. Define {pk }k and p in Tx M by

x,pk (1) = (yk , k /2) k and x,p (1) = y , /2 .


By regularity of the Hamiltonian flow, {pk }k tends to p and if a bounded sequence
{pk + qk }k is contained in k then it converges (up to a subsequence) to some
point in y . This shows that k tends to y . All in all we proved that the mapping
y Px y is continuous at y . Let S be a smooth compact submanifold of
dimension r in Tx M which is transverse to y at p , that is such that
y S = {p} and Tp S Tp y = {0}.
By regularity of y y , there is an open neighrborhood O O of y such that S
is transverse to any y with y Px O . We infer that

58

2 Sub-Riemannian Geodesics

{y} = expx y = expx y S expx (S )

y Px O .

But since S has dimension strictly less than n, the set expx (S ) is a compact
set of measure zero in M. Then Px O cannot be dense in O . Which gives a
contradiction.

Returning to the proof of Theorem 2.14, we fix y Px with rank(y) = n. Given
an open set M, we call a function f : R Lipschitz in charts if it is Lipschitz
in a set of local coordinates in a neighborhood of any point of . This is equivalent
to saying that f is locally Lipschitz with respect to a Riemannian distance on M.
Lemma 2.18 There is an open set Oy of y in M such that the function


y Oy dSR x, y
is Lipschitz in charts.
Proof (Proof of Lemma 2.18) As before we fix an orthonormal family of vector
fields F in an open neighborhood V along := y which is associated with
u L 2 ([0, 1]; Rm ) through F . By a uniqueness-compactness argument, if {yk }k
converges to y and {k }k is a sequence of minimizing geodesics between x and yk
then it converges (up to a subsequence) to and is associated with a sequence of controls {uk }k which converges to u in L 2 ([0, 1]; Rm ) (see Proposition 2.3 and Remark
2.1). Then there is a neighborhood O of y such that for every y O every minimizing geodesic between x and y is contained in V with rank n. Let v1 , . . . vn in
L 2 ([0, 1], Rm ) be such that the linear operator
Rn Ty M

x,1
i
m
i=1 i Du EF v
x,1
is invertible. By continuity of u Du EF
, taking O smaller if necessary, we may
assume that for every y O and for every minimizing geodesic y from x to y
associated with a control uy , the linear operator

Rn Ty M

x,1
i
m
i=1 i Duy EF v
is invertible. For every y O, define F y : Rn M by

x,1

F () := EF
y

u +
y

i v

Rn .

i=1

This mapping is well-defined and smooth in a neighborhood of the origin, satisfies


F y (0) = y,

2.3 The Sub-Riemannian Exponential Map

59

and its differential at 0 is invertible. Hence by the Inverse Function Theorem, there are
an open neighborhood B y of y in M and a function G y : B y Rn with G y (y) = 0
such that
F y G y (z) = z

z B y .

From the definition of the sub-Riemannian distance between two points, we infer
that for any z B y we have

y
y i
G (z) i v
dSR (x, z) = eSR (x, z) u +

i=1

=: y (z).

L2

We conclude that, for every y O, there are a open set B y containing y and a C 1
function y : B y Rn such that
dSR (x, y) = y (y) and dSR (x, z) y (z) z B y .
The C 1 norms of the y s are uniformly bounded. This proves the lemma.

To conclude the proof of Theorem 2.14, we note that by the Rademacher Theorem,
the function y Oy dSR (x, y) is differentiable almost everywhere in Oy . By
Lemma 2.15, for every y Oy where the function is differentiable, there is py Tx M
such that


1
2
(x, ) .
y = expx py , dSR (x, y) = 2H x, py , x,py (1) = y, Dy dSR
2
Since dSR (x, ) is Lipschitz in Oy , there is some constant K > 0 such that all the
py s remain in a compact subset of Tx M. Now every y Oy can be approximated
by a sequence {yk }k of points in Oy where dSR (x, ) is differentiable. By compactness, up to taking a subsequence, the normal extremals starting at (x, pyk ) will converge to a normal extremal starting whose projection is a minimizing geodesic from
x to y.

Remark 2.6 We already know that the sub-Riemannian distance is continuous on
M M (see Proposition 1.13). The proof of Theorem 2.14 shows that if (, g) is
complete and x M be fixed, then the function y M dSR (x, y) R is locally
Lipschitz (in charts) on an open and dense subset of M.

2.4 The Goh Condition


Theorem 2.9 provides firt-order conditions for a given horizontal path to be a minimizing geodesic. The aim of the present section is to present a second-order necessary
condition for a given singular path to be minimizing. For sake of simplicity, we fix an

60

2 Sub-Riemannian Geodesics

orthonormal family F = {X 1 , . . . , X m } of smooth vector fields in some open chart


V which contains a minimizing geodesic : [0, 1] M from x to y (with x = y).
As before, we denote by u = u the control which is associated with through F .
Recall that C L 2 ([0, 1]; Rm ) is defined by
C(u) := u2L2

u L 2 [0, 1]; Rm .

Define F : L 2 ([0, 1]; Rm ) Rn R by




x,1
(u), C(u)
F(u) := EF

u L 2 [0, 1]; Rm .

The Lagrange Multiplier Theorem asserts that if u minimizes C(u) under the conx,1
(u) = y, then there are (Rn ) and 0 {0, 1} with (, 0 ) = (0, 0)
straint EF
such that
x,1
= 0 Du C.
Du EF

In Sect. 2.2, we saw that whenever 0 = 0 we cannot deduce that satisfies the
geodesic equation, that is that it is the projection of a normal extremal. In the case
0 = 0, the control u UFx,1 is necessarily singular which means that it is a critical
x,1
point of EF
. Thus we have to study what happens at second order.
Let U be an open set in L 2 = L 2 ([0, 1]; Rm ) and F : U RN be a function of
class C 2 with respect to the L 2 -norm. We recall that we call critical point of F any
u U such that Du F : U RN is not surjective. Given a critical point u, we call
corank of u, the quantity

corankF (u) := N dim Im Du F .


For every u U the second differential of F at u is the quadratic mapping on
Du2 F : L 2 RN satisfying
1
F(u + v) = F(u) + Du F(v) + Du2 F (v, v) + v2L2 o(1).
2
If Q : L 2 R is a quadratic form, we define its negative index by


ind (Q) := max dim(L) | Q|L\{0} < 0 .
We are now ready to state the result whose proof is given in Appendix B.
Theorem 2.19 Let F : U RN be a mapping of class C 2 in an open set U L 2
and u U be a critical point of F of corank r. If

2.4 The Goh Condition

61

 

ind Du2 F


|Ker(Du F)

Im Du F
\ {0},

then the mapping F is locally open at u , that is the image of any neighborhood of u
is an neighborhood of F(u).
From Proposition 1.11 and Remark 1.8, we know that for every non-zero form
p (Rn ) with
x,1
= 0,
p Du EF

the absolutely continuous arc p : [0, 1] (Rn ) \ {0} defined by


S(t)
1
p (t) := p S(1)

t [0, 1],

(2.16)

satisfies p (1) = p ,
p (t) =

u i (t) p (t) Du (t) X i

a.e. t [0, T ],

i=1

and

p (t) X i u (t) = 0

t [0, T ], i = 1, . . . m,

where S : [0, T ] Mn (R) is the solution to the Cauchy problem


= A(t)
S(t)

S(t)
a.e. t [0, T ], S(0)
= In ,

(2.17)

Mn (R), B(t)
Mn,k (R) are defined by
and the matrices A(t)
:=
A(t)

u i (t)JX i u (t)

a.e. t [0, T ]

(2.18)

i=1

and


:= X 1 (u (t)), , X m (u (t))
B(t)

t [0, T ].

(2.19)

The following result combined with Theorem 2.19 will yield a necessary condition
for a minimizing horizontal path to be strictly
abnormal. It holds in the general case
of a control u which belongs to UFx,1 L [0, 1]; Rm . We do not need u to be
minimizing.

Theorem 2.20 Let u UFx,1 L [0, 1]; Rm and p (Rn ) \ {0} be such that

62

2 Sub-Riemannian Geodesics
x,1
p Du EF
= 0.

(2.20)

Assume that
 

x,1
ind p Du2 EF


x,1
Ker(Du EF
)

< +.

(2.21)

Then the absolutely continuous arc p : [0, 1] (Rn ) \{0} defined by (2.16) satisfies



p (t) X i , X j u (t) = 0

t [0, 1], i, j = 1, . . . , m.

(2.22)

Proof Let us first check that F is of class C 2 on UFx,1 (we refer the reader to Appendix
x,1
B for basics in differential calculus in infinite dimension). Given u UF
and
2
m
v L ([0, T ]; R ) we need to study the quantity

x,1

x,1

u + v EF
u ,
u+v (1) u (1) = EF
at second order when is small. We have

u+v (1) =
0

k
1

(ui (t) + vi (t)) X i u+v (t) dt,

(2.23)

i=1

with u+v (0) = x. For every i = 1, . . . , m and every t [0, 1], the Taylor expansion
of each X i at u (t) at second order gives

X i u+v (t) = X i u (t) + Du (t) X i u+v (t) u (t)

1
+ D2u (t) X i u+v (t) u (t), u+v (t) u (t)
2
+ |u+v (t) u (t)|2 o(1).

Setting x (t) := u+v (t) u (t) for any t, (2.23) yields formally (x has size )

x (1) =

! m

ui (t)Du (t) X i x (t) +

i=1

+
0

"

vi (t)X i u (t) dt

i=1

i=1
2
+ v o(1).

"
m
1
2
i
vi (t)Du (t) X x (t) +
ui (t)Du (t) X (x (t), x (t)) dt
2
i

i=1

Writing x (t) as x (t) = x1 (t) + x2 (t) + o(2 ) where x1 is linear in and x2 is


quadratic in we infer that x1 and x2 must satisfy formally

2.4 The Goh Condition

!
x1 (t)

63

"
ui (t)Du (t) X

x1 (t) +

i=1

"

vi (t)X u (t)

a.e. t [0, 1]

i=1

and
x2 (t)

! m

"
ui (t)Du (t) X

x2 (t) +

i=1

1
2

"
vi (t)Du (t) X

x1 (t)

i=1



ui (t)D2u (t) X i x1 (t), x1 (t)

a.e. t [0, 1].

i=1

Then from the Taylor expansion


u+v (1) = u (1) + x1 (1) + x2 (1) + o 2 ,
we obtain that (here we use the notations of the proof of Proposition 1.8)

x,1
Du2 EF

(v, v) = 2

S(1)S(t)1 [C(t) + D(t)] dt

v L 2 [0, 1]; Rm

where
C(t) =

vi (t)Du (t) X i x1 (t)

(2.24)

i=1

and

1
ui (t)D2u (t) X i x1 (t), x1 (t) .
2
m

D(t) =

(2.25)

i=1

x,1
is C 2 on UFx,1 .
Using Gronwalls lemma, we leave the reader to check that EF

Let us now fix u UFx,1 L [0, 1]; Rm and p (Rn ) \ {0} such that (2.20)
and (2.21) are satisfied and prove that (2.22) holds. Note that we have for every
v L 2 ([0, 1]; Rm ) ,


x,1
(v, v) = 2
Du2 EF



S(t)
1 C(t)
+ D(t)

S(1)
dt,

(2.26)

D
are obtained by replacing u by u in (2.24)(2.25) and the definitions of
where C,
A,
B (see (2.17)(2.19)).
S,
Lemma 2.21 There is K > 0 such that
 for any t , > 0 with [t , t + ] [0, 1],
x,1
there holds for every v Ker Du EF with Supp(v) [t , t + ],

64

2 Sub-Riemannian Geodesics



 2 x,1
t , (v) K v2 2 2 ,
Du EF (v, v) Q
L
t , : L 2 ([0, 1]; Rm ) Rn is defined by
where Q

 t +
 t
m
m


i
j
t , (v) :=
p t
vi (t)D (t ) X
vj (s)X (t ) ds dt,
Q
t

t
j=1

i=1

(2.27)

for every v L 2 ([0, 1]; Rm ).


Proof (Proof of Lemma 2.21) Let t , > 0 with [t , t + ] [0, 1] and v
x,1
Ker(Du EF
) with Supp(v) [t , t + ] be fixed. By Remark 1.5, we have

S(1)

1 B(t)v(t)

S(t)
dt = 0.

Then (2.26) yields






x,1
p Du2 EF

|Ker

x,1
(du EF
)

(v) = 2



+ D(t)

dt.
p (t) C(t)

Setting

x1 (t) := S(t)

1 B(s)v(s)

S(s)
ds

t [0, 1],


  
we have x1 (t) = 0 for every t  0, t  t + , 1 and by Cauchy-Schwarzs
inequality, we have for every t t , t + ,





 1 
S(s)
1 B(s)

t t vL2 K1 vL2 ,
x (t) sup S(t)
s[0,1]

S 1 , B in a neighborhood
where K1 is a constant depending only upon the sizes of S,
of the curve u ([0, 1]). Then we have

D(t)
=0
and


  
t t 0, t t + , 1 ,




D(t)
 K3 v2 2 u
L

which gives






1
0



t t , t + ,




p (t) D(t) dt  K4 v2L2 2 ,

2.4 The Goh Condition

65

where K3 , K4 are some constants depending on K1 , on the size of the D2 X j s, p and


uL . Note that since we can write ( = u )
x1 (t)

 t
m
t
j=1

= x1 (t)
=
=

 t
0

vj (s)X j (t ) ds

 t
m
t
j=1

vj (s)X j (s) ds +

 t
m
t
j=1

S(s)
1 B(s)v(s)

S(t)
B(s)v(s)
ds +

 t

1 B(s)v(s)
S(s)

S(s)
S(t)
ds +

vj (s) X j (s) X j (t ) ds

 t
m

t
j=1
 t
m

t
j=1

vj (s) X j (s) X j (t ) ds

vj (s) X j (s) X j (t ) ds,

we have (since u belongs to L [0, 1]; Rm , S and are both Lipschitz)






 t
m

 1



j
 K2 vL2 23 , t t t , t + ,
 (t)

v
(s)X

(
t
)
ds
j

 x
t


j=1

(2.28)

S 1 , B and the Lipschitz


where K1 is a constant depending only upon the sizes of S,
constants of the X j s in a neighborhood of the curve u ([0, 1]). By (2.27), we have
 t +
dt Q
t , (v) =
dt Q
t , (v)
p (t) C(t)
p (t) C(t)
t
0

 t +
 t
m
m
m

i 1
i
j

=
p (t)
vi (t)D (t) X x (t)
vi (t)D (t ) X
vj (s)X (t ) ds dt


=

i=1
t +

p (t)

i=1

j=1


 t
m

vi (t)D (t) X i x1 (t)


vj (s)X j (t ) ds dt.
t

i=1

j=1

By (2.28), we infer that




 t +


dt Q
t , (v) K5 v2 2 2 ,
p (t) C(t)

L
 t

for some constant K5 depending on the datas. All in all, we get





1
0






p (t) C(t) + D(t) dt Qt , (v) K6 v2L2 2 ,

for some constant K6 depending on the datas. We conclude easily.

66

2 Sub-Riemannian Geodesics

Returning to the proof of Theorem 2.20, we argue by contradiction and assume


that (2.22) does not hold. Hence we assume that there are t (0, 1) and i = j
{1, , m} such that

Ni,j (t ) := p t X i , X j (t ) > 0

= p t D (t ) X j X i (t ) D (t ) X i X j (t ) .
t , : L 2 ([0, 1] Rn be the mapping
Let > 0 such that [t , t + ] [0, 1] and Q

defined by (2.27). We observe that there holds for every v L 2 [0, 1]; Rm ,
t , (v) =
Q


t


=

t +

t
t +




vi (t)vj (s) p (t ) D (t ) X i X j (t ) ds dt

i,j=1
t

v(s), Mv(t)
ds dt =


t

t +

w(t), Mv(t)
dt,

is the m m matrix defined by


where M

i,j = p (t ) D (t ) X i X j (t ) ,
M
and

w(t) :=

v(s) ds

t [0, 1].

Thanks to Lemma 2.21, in order to get a contradiction, we need


to show

that for every


integer N > 0, there are > 0 and a subspace L L 2 [0, 1]; Rm of dimension
t , to L \ {0} satisfies the following property:
larger than N such that the restriction of Q
t , (v) < K v2 2 2
Q
L

v L \ {0}.

As a matter of fact, given N N strictly larger than n, if L is a vector subspace of


dimension N, then the linear operator


x,1
Du EF

|L

: L Rn

has a kernel of dimension at least N n, which means that




x,1
L
Ker Du EF
has dimension at least N n.

2.4 The Goh Condition

67



Let N an integer strictly larger than n be fixed and > 0 with t , t + [0, 1]

to be chosen later. Denote by L = Lt ,,N the vector space in L 2 [0, 1]; Rm of all the
controls v such that there is a sequence {a1 , . . . , aN } such that




v (t) = N ak cos k (tt )2
k=1
i


v (t) = N ak sin k (tt )2
k=1
j



t t , t + ,



t
/ t , t + ,

vi (t) = vj (t) = 0
and
vi (t) = 0, i = i, j
Let v L \ {0}, taking as before w(t) :=

wi (t) =

w (t) =

N

+t
t

t [0, 1].

v(s)ds, we have




sin k (tt )2



ak
(tt )2
,
k 1 cos k

ak
k=1 k

N
k=1

wi (t) = wj (t) = 0



t t , t + ,



t
/ t , t + ,

and
wi (t) = 0, i = i, j

t [0, 1].

Then we have

0

wi (t)vj (t) dt =

+ 2 2

a
k

k=1

4 k

and

0

wj (t)vi (t) dt =

+ 2 2

a
k

k=1

4 k

We have for every t [0, 1]

v (t) + w (t)M
v (t) + w (t)M
v (t) + w (t)M
v (t).
w(t), Mv(t)
= wi (t)M
ii i
i
ij j
j
ji i
j
jj j

But

68

2 Sub-Riemannian Geodesics

1
0

v (t) dt = M

wi (t)M
ii i
ii


0


wi (t)w i (t) dt = 0 =

v (t) dt.
wj (t)M
jj j

In conclusion, we have
t , (v) =
Q

1
0

N
N
2 Ni,j t


2 ak2
ak2

=
.
w(t), Mv(t) dt = Ni,j t
4 k
4
k
k=1

k=1

t , (v) is negative. Moreover, we observe that


Since Ni,j t > 0, Q
v2L2 =

ak2 ,

k=1

which yields



Q
t , (v)
v2L2 2

N a 2


k
Ni,j t
1 Ni,j t
k=1 k
=

.

2
4 N

4
k=1 ak

We conclude easily by taking > 0 small enough.

A minimizing geodesic is called strictly abnormal if it is singular and admits no


normal extremal lift. A control is called strictly abnormalif its associated horizontal
path is strictly abnormal.
Theorem 2.22 Let : [0, 1] M be a minimizing geodesic from x to
y (with

x = y) which is strictly abnormal. Then there is an abnormal lift = , p :


[0, 1] T M of such that



p (t) X i , X j (t) = 0

t [0, 1], i, j = 1, . . . , m.

The latter property is called the Goh condition and is a Goh path.
Proof According to the previous notations, we define the mapping F : UFx,1
Rn R by


x,1
u UFx,1 .
(u), u2L2
F(u) := EF
This function, which is of class C 2 , cannot be open at u . As a matter of fact, if the
image of a neighborhood of u contains a neighborhood of F(u) then it contains a
control u UFx,1 with
2
x,1
EF
(u) = y and u2L2 u L2 ,
which contradicts the minimality of u from x to y. Therefore by Theorem 2.19 we

\ {0} such that


infer that there is Im Du F

2.4 The Goh Condition

69

 

ind Du2 F

|Ker(du F)

< r := n rank(u).

Since the control u is strictly abnormal, the last coordinates of is zero. Denote by
p the dual of the first n coordinates of . Then we have

v L 2 [0, 1]; Rm .

x,1
p Du EF
(v) = 0

Since u is minimizing, |u(t)| is constant and u belongs to L [0, 1]; Rm . Theorem 2.20 concludes the proof.

Example 2.1 A distribution is called medium-fat if, for every x M and every
section X of with X(x) = 0, there holds


Tx M = (x) + [, ](x) + X, [, ] (x),

(2.29)

where




, (x) := [X, Y ](x) | X, Y sections of
and






X, [, ] (x) := X, [Y , Z] (x) | Y , Z sections of .

Any two-generating distribution is medium-fat. An example of medium-fat distribution which is not two-generating is given by the rank-three distribution in R4 with
coordinates x = (x1 , x2 , x3 , x4 ) generated by the vector fields defined by
X 1 = x1 , X 2 = x2 , X 3 = x3 + (x1 + x2 + x3 )2 x4 .
Medium-fat distribution do not admit non-trivial Goh paths. As a matter of fact, if
: [0, T ] M is an horizontal path which admits an abnormal lift = ( , p) :
[0, T ] T M satisfying the Goh condition, then we have



p(t) X i , X j (t) = 0

i, j = 1, . . . , m,

(2.30)

for every t in a small interval I [0, T ] such that (t) is in a local chart of M and
is parametrized by a family {X 1 , . . . , X m } of smooth vector fields. Then if we denote
by u the control which is associated to through F , derivating the previous equality
yields for any i, j = 1, . . . , m,
p(t)

! m

"


(t) = 0
uk (t)X , X , X
k

t I.

(2.31)

k=1

Since = ( , p) is an abnormal lift, we also have p X i = 0 along , then by (2.29),


(2.30) and (2.31) we get a contradiction.

70

2 Sub-Riemannian Geodesics

2.5 Examples of SR Geodesics


Geodesics in the Heisenberg group. The Heisenberg group H1 is the
sub-Riemannian structure (, g) in R3 where is the totally nonholonomic rank 2
distribution (see Example 1.2) spanned by the vector fields
y
X = x z
2

and

x
Y = y + z ,
2

and g is the metric making the family {X, Y } orthonormal, that is defined by
g = dx 2 + dy2 .
The above structure can be shown to be left-invariant under the group law



1





(x, y, z)  x , y , z = x + x , y + y , z + z + xy x y .
2
Thanks to Proposition 1.7, any horizontal path on [0, T ] has the form u = (x, y, z) :
[0, T ] R3 where

x (t) = u1 (t)
y (t) = u2 (t)

z (t) = 21 (u2 (t)x(t) u1 (t)y(t)) ,

for some u L 2 [0, T ]; R2 . This means that



z(T ) z(0) =
0

1
(x(t)y(t) y(t)x (t)) dt =
2


c

1
(xdy ydx) ,
2

where (t) = (x(t), y(t)) is the projection of the curve to the plane. According to
the Stockes Theorem, we have



1
1
dx dy +
(xdy ydx) =
(xdy ydx) ,
2
2

D
c
where D denotes the domain which is enclosed by the curve and the segment




c := Q1 , Q2 := (x(0, y(0)), (x(T ), y(T )
from Q1 to Q2 (see Fig. 2.3).
Therefore, given two points P1 = (x1 , y1 , z1 ), P2 = (x2 , y2 , z2 ) in R3 , the horizontal paths which minimizes the length from P1 to P2 are the curves : [0, 1] R3
whose signed area of D satisfies

2.5 Examples of SR Geodesics

71

Fig. 2.3 The curve , the domain D and the segment c


D

dx dy = (z2 z1 )
c

1
(xdy ydx) ,
2

with minimal length. According to the isoperimetric inequality, the curves in the
plane sweeping the same area and which minimize the length are given by circles.
This fact can be easily recovered by Theorem 2.9 and Proposition 2.8 (we saw
in Example 1.14 that admits no non-trivial singular horizontal paths). Assume
that u = (x, y, z) : [0, 1] R3 is a minimizing geodesic from P1 := u (0)
to P2 := u (1) = P1 . Then according to Proposition 2.8, there is a smooth arc
p = (p1 , p2 , p3 ) : [0, 1] (R3 ) such that the following system of differential
equations holds

y
x = px 2 pz
y = py

+ 2x pz

z = 21 py + 2x pz x px 2y pz y ,

x
pz p2z
p x =
2

py +
p = px 2y pz p2z
y
p z = 0.

Hence pz = p z for every t. Which implies that


x = pz y and y = p z x .
If p z = 0, then the geodesic from P1 to P2 is a segment with constant speed. If
p z = 0, we have or
...
...
x = p2z x and y = p2z y .
Which means that the curve t (x(t), y(t)) is a circle.
A singular minimizing geodesic. As we said above, minimizing geodesics do not
necessarily satisfy the Hamiltonian geodesic equation. As an example, consider the
Martinet-like distribution (see Examples 1.10 and 1.16) in R3 (with coordinates
(x1 , x2 , x3 )) generated by

72

2 Sub-Riemannian Geodesics

X = x1 , Y = (1 + x1 (x)) x2 + x12 x3 ,
where is a smooth function and equipped with a metric g making {X, Y } an
orthonormal family. In a sufficiently small neighborhood of the origin V , singular curves are given by the horizontal paths which are contained in the Martinet set


= x1 = 0 ,
that is of the form


 t
x(t) = 0, x2 (0) +
u2 (s)ds, 0, x3 (0) ,
0

with u2 L 2 ([0, T ]; R). Such curves are locally minimizing.


Theorem 2.23 There is > 0 such that for every (0, ) the horizontal path
given by

(t) = 0, t, 0

t [0, ],

minimizes the length among all horizontal paths joining (0, 0, 0) to (0, , 0).
Proof We need to show that among all controls u = (u1 , u2 ) : [0, ] R2 with
u12 + u22 1 steering the origin to P := (0, , 0), we have < . There is r > 0 such
that B SR (0, r) is included in V . If (0, r), then any minimizing geodesic joining
0 to (0, , 0) is contained in B SR (0, r). As a matter of fact, we know that dSR (0, P)
< r. Let C > 0 be upper bounds for on B SR (0, r) Let u = x : [0, ] R3 be
a competitor for . We get easily

+
x1 ( ) = +0 u1 (s) ds

=0

x2 ( ) = +0 u2 (s) 1 + x1 (s) x(s) ds =

x3 ( ) = 0 u2 (s)x1 (s)2 ds = 0.

(2.32)

Set



:= max x1 (s) | s [0, ] .

(2.33)

Note that if u = , then is necessarily positive. Taking r > 0 smaller if necessary


(and a fortiori > 0 smaller), we may assume that 1/(2C). The last equation
in (2.32) yields (2.33)

0

x1 (s)2 1 u2 (s) ds +
0



u2 (s) ds .
2

x1 (s)2 ds =

x1 (s)2 u2 (s) ds

(2.34)

2.5 Examples of SR Geodesics

73

Let s [0, ] be such that |x1 (s)| = . Since |x1 (s)| 1 for almost every s [0, ]
and x1 (0) = x1 ( ) = 0, we have s , s . Which means that the interval
[s /2, s + /2] is included in [0, ] and


x1 (s)
2



s s /2, s + /2 .

Therefore we have

x1 (s)2 ds

s +/2

s /2

x1 (s)2 ds

3
.
4

By (2.34), we deduce that





3
u2 (s) ds
2
4
0
which implies


u2 (s) ds

.
4

(2.35)

Then by the second line in (2.32) and the definitions of and C, we have


0
0

u2 (s) ds +
u2 (s) ds +

0

u2 (s)x1 (s) x(s) ds







u2 (s) x1 (s)  x(s)  ds

u2 (s) ds + C.

Consequently by (2.35), we get




1
.
+ C
4
In conclusion, if > 0 and < 1/(4C) (that is > 0 small enough), then cannot
be smaller than . This shows the result.

According to Proposition 2.8, for every p = (p1 , p 2 , p 3 ) (R3 ) , the normal
extremal (with respect to g) on [0, 1] starting at (0, p) is the trajectory (x, p) :
[0, 1] R3 (R3 ) satisfying

x 1 = p1
x 2 = (p Y (x)) (1 + x1 (x))

x 3 = (p Y (x)) x12 ,

(2.36)

74

2 Sub-Riemannian Geodesics

p
(x)
+
x
=

Y
(x))
(x)
+
2p
x
(p
1
2
1
3
1




(x)
p 2 = p Y (x) p2 x1 x
2




p 3 = p Y (x) p2 x1 (x) ,
x3

(2.37)

with

p Y (x) = p2 1 + x1 (x) + p3 x12


and
x1 (0) = x2 (0) = x3 (0) = 0, p1 (0) = p 1 , p2 (0) = p 2 , p3 (0) = p 3 .

(2.38)

2
2
Note that

if
0, that is whenever g = dx1 + dx2 , then the horizontal path given by
(t) = 0, t, 0 for any t [0, ] is the projection of the normal extremal starting at
(0, p ) with p = (0, 1, 0). Then it is a singular normal minimizing geodesic between
its end-points (see Example 1.16 and Theorem 2.9). Different choices of metrics can
provide examples of strictly abnormal minimizing geodesics.

Proposition 2.24 If (0) = 0, then any reparametrization of is not the projection


of a normal extremal.
Proof We argue by contradiction and assume that there is p = (p1 , p 2 , p 3 ) (R3 )
and : [0, 1] R3 a reparametrization of such that the systems differential Eqs.
(2.36) and (2.37) are satisfied with x = and initial conditions (2.38). The system
(2.36) and (2.37) is the Hamiltonian system which is associated with the Hamiltonian
given by
H(x, p) =

1
1
(p X(x))2 + (p Y )2 .
2
2

Since H is constant along its extremals and x1 (t) = x3 (t) = 0 for any t [0, ], we
have

2
p(t) X (t)
+ p(t) Y (t)
= p1 (t)2 + p2 (t)2 = p 21 + p 22

t [0, 1].

On the other hand, since x1 (t) = 0 for every t [0, 1], the second and third equations
in (2.37) yield
p 2 = p 3 = 0 = p2 (t) = p 2 t [0, 1].
Moreover, (2.36) also gives x 2 = p 2 that is p 2 = 0 ( has constant speed). Since p1
is smooth and both p2 and p21 + p22 are constant, p1 is necessarily constant. The first
equation in (2.36) and x1 = 0 give x 1 = p1 . Hence p1 = p 1 = 0. Then, using that
p1 = 0, p2 = p 2 = 0, the first equation in (2.37) gives

2.5 Examples of SR Geodesics

75

p 2 (t) = 0

t [0, 1].

By assumption on (0), we deduce that p 2 = 0. Since we know that joins 0 to


(0, , 0) with = 0, this contradicts the equality x 2 = p 2 .


2.6 Notes and Comments


Theorem 2.9 may be seen as a weak form of the Pontryagin maximum principle which
has been developed by the russian school of control in the 60s. In the general context
of optimal control theory, the strong form of the Pontryagin maximum principle
provides necessary conditions for a control to be optimal. For further details on this
topics, we refer the reader to the seminal book by Pontryagin and its collaborators
[9] and to the more recent textbooks by Agrachev and Sachkov [2], Clarke [4], or
Vinter [11]. The material presented in Sects. 2.1 and 2.2 is by now classical. It can
be found in the Montgomery textbook [8] which also provides many references.
Theorem 2.14 about the image of the sub-Riemannian exponential map has been
proven by Agrachev and the author, see [1]. It extends a previous density result,
based on Lemma 2.15, which was obtained by Trlat and the author in [10]. Given a
complete sub-Riemannian structure (, g) on a smooth manifold M and x M, we
do not know if the image of expx has full Lebesgue measure in M. This open problem
is indeed contained in the sub-Riemannian Sard conjecture. Given x M (which
is equipped with a SR structure), denote by Sx,1 the set of singular horizontal paths
x,1
x,1
x,1
in
(that is Sx,1 :=
\ R
with the notations of Chap. 1). The SR Sard
x,1
conjecture states that the image of S by the End-Point mapping
x,T
x,1
E
:
M
(1),
x,T
can have a non-empty
has Lebesgue measure zero in M. We even do not know if E
interior in M. We refer the reader to Montgomerys book [8] for further details on
the SR Sard Conjecture and to the paper [10] for various sub-Riemannian Sard-like
conjectures.
The theory of second variation for singular geodesics in sub-Riemannian geometry
has been developed by Agrachev and its collaborators. The results and proofs that
we present in Sect. 2.4 are taken from Agrachev-Sarychevs paper [3]. Example 2.1
(medium-fat distributions) is taken from [3] as well.
For decades the prevailing wisdom was that every sub-Riemannian minimizing
geodesic is normal, meaning that it admits a normal extremal lift. In 1991, Montgomery [7] found the first counter-example to this assertion. We refer the reader to
Montgomerys book [8] for an historical account on the existence of strictly abnormal minimizing geodesics. The second example which is presented in Sect. 2.5 is
moreorless the Montgomery counter-example. The proof of local minimality of

76

2 Sub-Riemannian Geodesics

characteristic lines in the Martinet surface (Theorem 2.23) is taken from the
monograph by Liu and Sussmann [6] which provide a more general class of counterexamples. Note that the Montgomery counter-example as well as all other known
counter-examples exhibit smooth singular minimizing curves. The existence of nonsmooth sub-Riemannian geodesics is open.
In the first example of Sect. 2.5, we briefly explained that the sub-Riemannian
structure under study was indeed left-invariant under some group law. This additional
structure makes H1 a Carnot group. We refer the reader to the Montgomery textbook
[8] or to the Jean monograph [5] for further details on Carnot groups.

References
1. Agrachev, A.: Any sub-Riemannian metric has points of smoothness. Dokl. Akad. Nauk. 424(3),
295298 (2009), translation in. Dokl. Math. 79(1), 4547 (2009)
2. Agrachev, A.A., Sachkov, Y.L.: Control Theory from the Geometric Viewpoint. Encyclopaedia
of Mathematical Sciences, vol. 87, Springer, Heidelberg (2004) NULL
3. Rifford, L., Trlat, E.: Morse-Sard type results in sub-riemannian geometry. Math. Ann. 332(1),
145159 (2005)
4. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley-Interscience, New York (1983)
Republished as vol. 5 of Classics in Applied Mathematics, SIAM (1990)
5. Jean, F.: Control of Nonholonomic Systems and Sub-Riemannian geometry. Lectures given at
the CIMPA School Gomtrie sous-riemannienne, Beirut (2012)
6. Montgomery, R.: Abnormal minimizers. SIAM J. Control Optim. 32(6), 16051620 (1994)
7. Agrachev, A., Sarychev, A.: Sub-Riemannian metrics: minimality of singular geodesics versus
subanalyticity. ESAIM Control Optim. Calc. Var. 4, 377403 (1999)
8. Montgomery, R.: A tour of subriemannian geometries, their geodesics and applications. Mathematical Surveys and Monographs, vol. 91. American Mathematical Society, Providence, RI
(2002)
9. Pontryagin, L., Boltyanskii, V., Gamkrelidze, R., Mischenko, E.: The Mathematical Theory of
Optimal Processes. Wiley-Interscience, New-York (1962)
10. Liu, W., Sussmann, H.J.: Shortest paths for sub-Riemannian metrics on rank-2 distributions.
Mem. Amer. Math. Soc. 118, 564 (1995)
11. Vinter, R.B.: Optimal Control. Birkhuser, Boston (2000)

Chapter 3

Introduction to Optimal Transport

Abstract This Chapter is concerned with the study of optimal transport maps in the
sub-Riemannian setting. We first provide a course in optimal transport theory. Then
we study the well-posedness of the Monge problem for sub-Riemannian quadratic
costs.
Throughout all the chapter, M denotes a smooth connected manifold without boundary of dimension n 2.

3.1 The Monge and Kantorovitch Problems


The Monge problem. Let
c : M M [0, +)
be a cost function and , be two probability measures on M. We recall that a
probability measure on M is a Borel measure with total mass 1. The Monge optimal
transport problem from to with respect to the cost c consists in minimizing the
transportation cost




c x, T (x) d(x),
M

among all the measurable maps T : M M pushing forward to (we denote it


by T = ) that is satisfying


T 1 (B) = (B)

B measurable set in M.

Such maps are called transport maps from to (Fig. 3.1).


We set
L. Rifford, Sub-Riemannian Geometry and Optimal Transport,
SpringerBriefs in Mathematics, DOI: 10.1007/978-3-319-04804-8_3,
The Author(s) 2014

77

78

3 Introduction to Optimal Transport

Fig. 3.1 The Monge problem


CM (, ) := inf




c x, T (x) d | T = ,

(3.1)

where T = means implicitely that T is a measurable map from M to itself which


pushes forward to .
Remark 3.1 The property (3.1) is equivalent to



(T (x)) d(x) =

(y) d(y),

for all -integrable function . If M = Rn and and are absolutely continuous with respect to the Lebesgue measure respectively with densities f and g in
L 1 (Rn ; [0, +)), the latter property can be written as



(T (x))f (x) dx =
M

(y)g(y) dy,
M

for any L (Rn ; R). Therefore, if T is a diffeomorphism, then the change of


variable y = T (x) yields the Monge-Ampre equation
 

det Dx T  =

f (x)
g(T (x))

a.e. x Rn .

Example 3.1 Transport maps may not exist. For example, consider in Rn the probability measures , given by
= x and =

1
1
y 1 + y 2 ,
2
2

where x, y1 , y2 Rn , y1 = y2 and a denotes the Dirac mass at some point a Rn .


There are no transport maps from to . If such a map T exists, then





1
= {y1 } = T 1 {y1 } = 0 or 1,
2
which is impossible.

3.1 The Monge and Kantorovitch Problems

79

Example 3.2 Minimizers of Monges problem may not be unique. On the real line
R, consider the probability measures and given by
= 1[0, 1] L 1 and = 1[1, 2] L 1 ,
where L 1 denotes the Lebesgue measure in R. In other terms, and are respectively
the restriction of the Lebesgue measure on the intervals [0, 1] and [1, 2]. The two
maps T1 , T2 : R R given by
T1 (x) = x + 1 and T2 (x) = 2 x

x R,

push forward to . This is a straightforward consequence of the fact that both T1


and T2 are affine maps which are bijective from [0, 1] to [1, 2] with determinant
1 together with a change of variable (see Remark 3.1). Consider the Monge cost
c : R R [0, +) given by
c(x, y) := |y x|

x, y R.

We check easily that the transportation cost for T1 and T2 are given by


c x, Ti (x) d(x) =

|Ti (x) x| dx = 1

i = 1, 2.

Furthermore, we also check that if T is a map which pushes forward to , then



R

c x, T (x) d(x) =

|T (x) x| dx


=


[T (x) x] dx =


=


y dy

0
1

T (x) dx

x dx
0

x dx = 1.

This shows that the infimum in the definition of CM (, ) is attained by all transport
maps from to . So, it is not unique.
The constraint T = being highly non-linear, the Monge optimal transport
problem is quite difficult from the viewpoint of optimization. That is why we will
study a notion of weak solution for this problem.
The Kantorovitch relaxation. Given two probability measures , on M, we denote
by (, ) the set of probability measures in the product M M with first and
second marginals and , that is such that
1 = and 2 = ,

(3.2)

80

3 Introduction to Optimal Transport

where i : M M M denotes respectively the projection on the first and second


variable in M M. The Kantorovitch optimal transport problem with respect to the
cost c : M M [0, +) consists in minimizing the quantity

C() :=

c(x, y) d(x, y),


MM

among all the (, ). Any measure in (, ) is called a transport plan


between and . We set

CK (, ) := inf C() | (, ) .
Remark 3.2 The property (3.2) is equivalent to




(B) = B M and (B) = M B ,
for any measurable set B in M, which is also equivalent to


1 (x) + 2 (y) d(x, y) =

MM


1 (x) d(x) +
M

2 (y) d(y),
M

for all -integrable function 1 and -integrable function 2 . In particular, the set
(, ) is a convex set which always contains the product measure .
Remark 3.3 If T : M M is a transport map from to then the measure on
M M given by
:= (Id T ) ,
is a transport plan between and . This means that the Kantorovitch optimization
problem is more general than the Monge optimization problem, or
CK (, ) CM (, ),
for all probability measures , on M.
Example 3.3 Returning to Example 3.1, we note that the product measure
=

1
1
(x, y1 ) + (x, y2 ) ,
2
2

is a transport plan between and . In contrary to Monges transport maps,


Kantorovitchs transport plans allow splitting of mass.
The Kantorovitch optimal transport problem is an infinite-dimensional optimization problem which involves a functional C which is linear in and a set of constraints
(, ) which is convex and weakly compact. The existence of optimal transport
plans becomes easy.

3.2 Optimal Plans and Kantorovitch Potentials

81

3.2 Optimal Plans and Kantorovitch Potentials


Optimal plans. Throughout this section, we fix a cost c : M M [0, +). We
recall that the support spt() of a measure refers to the smallest closed set F M
of full mass (F) = (M) = 1.
Theorem 3.1 Let , be two probability measures on M. Assume that c is continuous and that Supp() and Supp() are compact. Then the Kantorovitch optimal
transport problem admits at least one solution, that is there is (, ) such that

C()
= CK (, ) := inf C() | (, ) .
Proof We first note that CK (, ) is finite. As a matter of fact, since the product measure belongs to (, ) and c is bounded on Supp() Supp()
(by assumption c is continuous and Supp(), Supp() are compact), we have
CK (, ) C( ) < +. In fact, the supports of all transport plans between
and are contained in the set Supp() Supp() M M which is compact by
assumption on Supp() and Supp(). Then we can assume without loss of generality
that M is compact. Denote by P(M M) the set of probability measures on M M
and define F : P(M M) R by

F() :=

P(M M).

c(x, y) d(x, y)
MM

The functional F is continuous on P(M M) equipped with the topology of weak


convergence, that is for any sequence {k }k and any in P(M M) satisfying



(x, y) dk (x, y) k+
MM

(x, y) d(x, y),


MM

for any measurable function : M R which is bounded, we have


lim F(k ) = F().

k+

This fact is a straigthforward consequence of the continuity of c together with the


compactness of M M. By Prokhorovs Theorem, the set of probability measures
on M M is compact with respect to weak convergence. We conclude easily. Let
{k }k be a sequence in (, ) such that
CK (, ) = lim C(k ).
k+

By Prokhorovs Theorem, up to taking a subsequence, we may assume that {k }


converges to some probability measure .
By Remark 3.2, belongs to (, ).

Moreover it satisfies C()


= CK (, ) by continuity of F.

82

3 Introduction to Optimal Transport

The supports of optimal transport plans have specific properties. Let us introduce
the notion of c-cyclically monotone sets.
Definition 3.2 A subset S M M is called c-cyclically monotone if for any
finite number of points (xj , yj ) S, j = 1, . . . , J, and a permutation on the set
{1, . . . , J},
J
J




c(xj , yj )
c x (j) , yj .
j=1

j=1

Remark 3.4 The definition given above is equivalent to the following one: for any
finite number of points (xj , yj ) S, j = 1, . . . , J,
J


c(xj , yj )

j=1

J



c xj , yj+1 ,
j=1

with yJ+1 = y1 . The equivalence is a straightforward consequence of the decomposition of a permutation into disjoint commuting cycles.
Remark 3.5 If c is assumed to be continuous, the c-cyclical monotonocity is stable
under closure. The closure of a c-cyclically monotone set is c-cyclically monotone.
Given two probability measures , on M, we call optimal transport plan between
and any (, ) satisfying CK (, ) = C(). Optimal transport plans
always have c-cyclically monotone supports.
Theorem 3.3 Let , be two probability measures on M. Assume that c is continuous and that Supp() and Supp() are compact. Then there is a c-cyclically monotone
compact set S Supp() Supp() such that the support of any optimal transport
plan between and is contained in S .
Proof Let us first show that the supports of optimal transport plans are always
c-cyclically monotone. We argue by contradiction and assume that there is an optimal
transport plan (, ) whose support is not c-cyclically monotone. Then there
is an integer J > 1, J points (x1 , y1 ), . . . , (xJ , yJ ) in Supp() and a permutation
on the set {1, . . . , J} such that
J


c(xj , yj ) >

j=1

J



c x (j) , yj .
j=1

By continuity of c, there are open sets Uj , Vj for j = 1, . . . , J which contain respectively xj , yj such that
J

j=1

c(uj , vj ) >

J



c u (j) , vj





J
uj , vj j=1,...,J j=1
Uj Vj .

j=1

(3.3)

3.2 Optimal Plans and Kantorovitch Potentials

83

Each (xj , yj ) belongs to the support


of , then we have (Uj Vj ) > 0. Define the

J
Uj Vj by
probability measure P on j=1

P=

J
j=1


1
 1Uj Vj .

Uj Vj

It is a product of probability measures, hence it is a probability measure as well. Set



m
:= min Uj Vj | j = 1, . . . , J ,


J
denote by Uj (resp. Vj ) the projection from j=1
Uj Vj to Uj (resp. to Vj ) and
define the measure on M M by

J
 U V
m
  U (j) Vj 

, P j , j P .
= +
J
j=1

We have
m

m
 Uj Vj 
1
 1Uj Vj

, P =
J
J

V
j
j
j=1
j=1
J

1
1Uj Vj = 0.
J
J

j=1

Moreover

J
J
J
J








U
U
Uj , Vj P =
U (j) , Vj P
1
j P =
(j) P = 1
j=1

and

j=1

j=1

j=1

J
J
J







V
Uj , Vj P =
U (j) , Vj P .
j P = 2
2
j=1

j=1

j=1

Therefore is a non-negative measure which belongs to (, ). But by construction, we have





c(x, y) d (x,
y) =
MM

c(x, y) d(x, y)
MM

 
J 





 

c u (j) , vj c uj , vj dP u1 , v1 , , . . . , uJ , vJ ,
j=1

84

3 Introduction to Optimal Transport

and the last term is negative [by (3.3)]. This means that cannot be optimal and
gives a contradiction. Then we know that the supports of any optimal transport plan
between and is c-cyclically monotone. Denote by opt (, ) the set of optimal
transport plans in (, ) and set
S :=

Supp().

opt (,)

By construction, S is a subset of Supp() Supp() M M which contains


the supports of all optimal transport plans. It remains to show that S is c-cyclically
monotone. Let (x1 , y1 ), . . . , (xJ , yJ ) be J points in S and be a permutation on the
set {1, . . . , J}. For each j = 1, . . . , J the point (xj , yj ) belongs to the support of an
optimal transport plan j . Let be the convex combination of the j s, that is
1
j .
J
J

:=

j=1

Since (, ) is convex and the mapping C() is linear, belongs to


opt (, ). Then its support is c-cyclically monotone and contains all the (xj , yj )s.
We infer that
J

j=1

J



c(xj , yj )
c x (j) , yj .
j=1

We conclude by Remark 3.5.

Example 3.4 Returning to Example 3.2, we can show that the set provided by
Theorem 3.3 has to be S = [0, 1] [1, 2] = Supp() Supp(). As a matter of
fact, for every (x, y) [0, 1] [1, 2] there is a bijective function T : [0, 1] [1, 2]
which is lower semicontinuous, increasing and piecewise affine with slope 1, and
whose graph contains (x, y) (see Fig. 3.2). Thanks to the observation we did in Example 3.2, such a function is a transport map from = 1[0, 1] L 1 to = 1[1, 2] L 1 ,
hence it is optimal.
Kantorovitch potentials. The aim of this section is to characterize c-cyclically
monotone sets in a more analytic way.
Definition 3.4 A function : M R {+}, not identically +, is said to be
c-convex if there is a non-empty set A M R such that

(x) := sup c(x, y) | (y, ) A

x M.

(3.4)

The c-transform of , denoted by c is the function c : M R {}


defined by

3.2 Optimal Plans and Kantorovitch Potentials

85

Fig. 3.2 Solution to Example 3.4

c (y) := inf (x) + c(x, y) | x M

y M.

The pair (, c ) is called a c-pair of Kantorovitch potentials.


The following result shows that the opposite of a c-convex function is the
c-transform of the opposite of its c-transform.
Proposition 3.5 Given a c-convex function , the function c is c-convex and
we have

x M.
(x) = sup c (y) c(x, y) | y M
Proof By definition of c we have
c (y) c(x, y) (x)

x M, y M.

Which implies that (x) supyM { c (y) c(x, y)} for any x M. Let us show
that (x) supyM { c (y) c(x, y)} for any x M. Argue by contradiction and
assume that there is x M such that

 
 
x > sup c (y) c x , y y M .
Since is c-convex, there are a set A M R, (y, ) A and > 0 such that

 
 
 
c x , y + x sup c (y) c x , y | y M + 3.
 
Then we get c y 2, which by definition of c (y) implies that there is
x M such that
 
(x) + c x, y .
This contradicts (3.4).

86

3 Introduction to Optimal Transport

Example 3.5 If M = Rn and c is given by c(x, y) = |y x|, then the c-convex


functions are exactly the functions which are 1-Lipschitz on Rn . As a matter of fact,
if f : Rn R is 1-Lipschitz then for every x Rn ,
f (x) f (y) |y x|
which yields

y Rn ,

f (x) = sup f (y) c(x, y) | y Rn .

Moreover, f is its own c-transform. Conversely, any c-convex function is a supremum


of 1-Lipschitz function which is not identically +. Then it is finite everywhere
and 1-Lipschitz.
Example 3.6 If M = Rn and c is given by c(x, y) = |y x|2 /2, then the c-convex
functions are the functions : Rn R {+} such that the function
1
x Rn (x) + |x|2
2
is convex. As a matter of fact, any c-convex function can be written as
|x|2

|y|2
x, y | (y, ) A
(x) = sup
2
2

x Rn .

which shows that +||2 /2 is convex as a supremum of affine functions. Conversely,


any convex function on Rn can be expressed as the supremum of affine functions.
That is given a convex function : Rn R {+}, there is a set B Rn R
such that

(x) = sup x, y + | (y, ) B


x Rn .
Then for every x Rn ,
1
(x) |x|2 = sup
2




|y x|2
|y|2
| (y, ) B ,

+
2
2

which shows that := | |2 /2 is c-convex.


The c-cyclically monotone sets are the sets which are contained in the
c-subdifferential of c-convex functions.
Definition 3.6 Let : M R {+} be a c-convex function. For every x M,
the c-subdifferential of at x is defined by

c (x) := y M | c (y) = (x) + c(x, y) .


We call contact set of the pair (, c ) the set defined by

3.2 Optimal Plans and Kantorovitch Potentials

87

c := (x, y) M M | y c (x) .
Remark 3.6 By the above definitions, a pair (x, y) in M M belongs to c if and
only if
(x) + c(x, y) (z) + c(z, y)
z M,
which is also equivalent to
c (y) c(x, y) c (z) c(x, z)

z M.

In particular, both (x) and c (y) are finite.


The following result is the cornerstone of the results of existence and uniqueness
of optimal transport maps that we will present in the next sections.
Theorem 3.7 For S M M to be c-cyclically monotone, it is necessary and
sufficient that S c for some c-convex : M R {+}. In fact, for every
c-cyclically monotone set S M M, there is a c-pair of potentials (, c ) with
S c satisfying

(x) = sup c (y) c(x, y) | y 2 (S)

x M,

(3.5)

c (y) = inf (x) + c(x, y) | x 1 (S)

y M.

(3.6)

If c is continuous and S is compact, then both , c are valued in R and continuous,


and the infimum and supremum in (3.5)(3.6) are attained.
Proof First, given a c convex function : X R{+} the contact set of (, c )
is c-cyclically monotone. As a matter of fact, given (xj , yj ) c , j = 1, . . . , J, and
a permutation on the set {1, . . . , J}, we have
 
 


 




c yj = xj + c xj , yj and c yj x (j) + c x (j) , yj ,
for every j = 1, . . . , J. Hence
J
J
J
J
J

 
  
  


  

c
c
c xj , yj =
yj
xj =
yj
x (j)
j=1

j=1

j=1

j=1

J


j=1



c x (j) , yj .

j=1

Let us now show that a c-cyclically monotone set S M M is necessarily included


in the contact set of some c-convex function. Fix (x , y ) in the c-cyclically monotone

88

3 Introduction to Optimal Transport

set S M M and define : M R {+} by

 


(x) := sup c x , y c x1 , y
+

J1




 



c xj , yj c xj+1 , yj + c xJ , yJ c x, yJ
j=1



| J N, J 2, xj , yj S, j = 1, . . . , J ,

for every x M. We claim that is a c-convex function whose contact set contains S.
First taking J = 2, x = x1 = x2 = x and y1 , y2 = y , we check easily that (x ) 0.
Furthermore, by c-cyclical monotonicity of S, we have
J1
 




 



 
c x , y c x1 , y +
c xj , yj c xj+1 , yj + c xJ , yJ c x , yJ 0,
j=1

for any pairs (x1 , y1 ), . . . , (xJ , yJ ) belonging to S. Thus we have (x ) 0 and in


turn (x ) = 0. This shows that is not identically +. Define : M R
{} by
J1

 






 

c xj , yj c xj+1 , yj + c xJ , y
(y) := sup c x , y c x1 , y +

j=1

| J N, J 2, xj , yj S, j = 1, . . . , J 1, (xJ , y) S

y M.

Note that if y M is such that there are no x M with (x, y) S, then (y) = .
However, as above we check easily that (y) = 0 which shows that is not identically
. Therefore, by construction we have for every x M,

(x) = sup (y) c(x, y) | y 2 (S) = sup (y) c(x, y) | y M ,

(3.7)

which shows that is c-convex. It remains to check that S c . Let (x, y) S be


fixed, we need to show that
(x) + c(x, y) (z) + c(z, y)

z M.

By construction of , we have for every z M,


J1

 
 





c xj , yj c xj+1 , yj
(z) sup c x , y c x1 , y +
j=1


+ c(x, y) c(z, y) | J N, J 2, xj , yj S,

3.2 Optimal Plans and Kantorovitch Potentials

89

j = 1, . . . , J 1, xJ = x

= (x) + c(x, y) c(z, y).


We get the necessary and sufficient condition for a set to be c-cyclically monotone.
Let us now turn to the second part of the result, that is let us prove that for any
c-cyclically monotone set S M M, there is a c-pair of potentials , c with
S c which in addition satisfies (3.5)(3.6).
Let S be a c-cyclically monotone set. We already know that there is : M
R {} which is not identically such that the function : M R {+}
defined by

(x) := sup (y) c(x, y) | y 2 (S)

x M,

(3.8)

is c-convex with S c (remember (3.7)). Let 1 = c : M R {} be the


c-transform of , that is the function defined by

1 (y) := inf (x) + c(x, y) | x M

y M.

(3.9)

If y 2 (S), then there is x M with (x) = (y) c(x, y) and (x, y) S c ,


that is
(y) = (x) + c(x, y) (z) + c(z, y)
z M.
Then we get (y) 1 (y) for all y 2 (S). On the other hand, by construction of
, we have (x) (y) c(x, y) for any x M and any y 2 (S). Therefore
1 (y) = (y)

y 2 (S).

(3.10)

By Proposition 3.5, we have

(x) = sup 1 (y) c(x, y) | y M

x M,

and by (3.8) and (3.10), we also have

(x) = sup 1 (y) c(x, y) | y 2 (S)

x M.

We claim that 1 defined by (3.9) satisfies

1 (y) = inf (x) + c(x, y) | x 1 (S)

y M.

If not, there are x , y M and > 0 such that


(x ) + c(x , y ) (z) + c(z, y )

z 1 (S).

90

3 Introduction to Optimal Transport

Taking the infimum in the right-hand side we get


(x ) + c(x , y ) 1 (y) .
But by construction of 1 , we have 1 (y) (x ) + c(x , y ). We get a contradiction.
It remains to show that both , c are finite valued and continuous provided c is
continuous and S is compact. We claim that under those assumptions, c is bounded
from above on 2 (S). Since is not identically +, there is x M with (x ) <
+. Since c is continuous and 2 (S) is compact, the function y c(x , y) is bounded
on 2 (S). Then we deduce that c is bounded on 2 (y). By (3.5), we infer that (x)
is finite for any x M. Let x M be fixed and {xk }k be a sequence converging to x.
For every k > 0, there is yk 2 (S) such that
 1

(xk ) c (yk ) c xk , yk + .
k
Then we have for every k > 0,








(x) c (yk ) c x, yk = c (yk ) c xk , yk + c xk , yk c x, yk




1
(xk ) + c xk , yk c x, yk .
(3.11)
k
For every k > 0, there is zk 2 (S) such that
 1

(x) c (zk ) c x, zk + .
k
Then we also have for every k > 0,








(xk ) c (zk ) c xk , zk = c (zk ) c x, zk + c x, zk c xk , zk




1
(x) + c x, zk c xk , zk .
(3.12)
k
Let V be a compact neighborhood of x. The function c is continuous on the compact
set V 2 (S), hence it is uniformly continuous. We conclude easily from (3.11)
(3.12) that (xk ) tends to (x) as k tends to +. In the same way, we can show
that is bounded on 1 (S) and c if always valued in R and continuous. The fact
that the infimum and supremum in (3.5)(3.6) are attained is straigthforward from

the continuity of , c and the compactness of S.


Corollary 3.8 Let , be two probability measures on M. Assume that c is continuous and that Supp() and Supp() are compact. Then there is a c-cyclically
monotone compact set S Supp() Supp() such that for every (, )
the following properties are equivalent:

3.2 Optimal Plans and Kantorovitch Potentials

91

(i) is optimal.
(ii) Supp() S .
Proof By Theorem 3.3, there is a c-cyclically monotone compact set S Supp()
Supp() such that the support of any optimal transport in (, ) is contained in
S . Let us show that S satisfies the equivalence given in the statement of the theorem. First, by construction we have (i) (ii). By Theorem 3.7, there is a c-pair of
potentials with S c . Then we have
c (y) (x) = c(x, y)

(x, y) S .

(3.13)

Furthermore we have
c (y) (x) c(x, y)

c(x, y)

x, y M.

(3.14)

Let us show that (ii) (i). Let (, ) be such that Supp() S . On the one
hand, by (3.13), we have


(y) d(y)

(x) d(x) =

 c

(y) (x) d(x, y)

MM
c(x, y) d(x, y) = C().

=
MM

On the other hand, (3.14) yields for every (, ),





c (y) d(y)
M

 c

(y) (x) d (x, y)

(x) d(x) =
M

MM

c(x, y) d(x, y) = C( ).

MM

This shows that is optimal.

Remark 3.7 Let , be two compactly supported probability measures on M and


c : M M [0, +) be a continuous cost. Actually, the proof of Corollary 3.1
shows that if (, c ) is a c-pair of potentials and is a transport plan between
and with Supp() c , then is optimal, that is CK (, ) = C().

3.3 A Generalized Brenier-McCann Theorem


Throughout this section, we fix a cost c : M M [0, +) which is assumed
to be continuous. Given two compactly supported probability measures , on M,
we know by Theorems 3.3 and 3.7 that there is a c-cyclically monotone compact set
S Supp() Supp() which contains the supports of all optimal plans between
and and a c-pair of real-valued continuous potentials (, c ) satisfying

92

3 Introduction to Optimal Transport

(x) = max c (y) c(x, y) | y 2 (S )

x M,

(3.15)

c (y) = min (x) + c(x, y) | x 1 (S )

y M,

(3.16)

and
S c .

(3.17)

To prove the existence and uniqueness of an optimal transport map, we will show
that S is concentrated on a graph. More precisely, we will prove that for every x
outside a -negligible set N M, the set c (x) is a singleton.
Theorem 3.9 Let , be two probability measures on M. Assume that c is continuous and that Supp() and Supp() are compact. Let S and (, c ) given by
Theorems 3.3 and 3.7 as above. Moreover assume that for -a.e. x M, the set
c (x) is a singleton. Then there is a unique optimal transport map from to . It
satisfies


c (x) = T (x)

a.e. x M.

(3.18)

Proof By Theorem 3.1, there is an optimal transport plan between and . By


assumption, there is a Borel set N such that (N) = 0 and for every x
/ N, c (x) is
a singleton {yx }. Then for every (x, y) Supp() \ (N M), we have (x, y) c ,
that is y = yx . Setting T (x) := yx for -a.e. x M, we get (3.18) and in turn the
uniqueness.

Remark 3.8 We maybe need to make clear what me mean by uniqueness of an


optimal transport map. We say that there is a unique optimal transport map from
to if there is uniqueness up to a set of -measure zero. That is if T1 and T2 are
two optimal transport maps from to , there is a set N with (N) = 0 such that
/ N.
T1 (x) = T2 (x) for every x
We now introduce an assumption on the cost c. For this we need to define the
notion of sub-differential. Given an open set M and a function f : R, we
say that p Tx M is a sub-differential for f at x if there is a function : R
which is differentiable at x with Dx = p such that (see Fig. 3.3)
f (x) = (x) and f (y) (y)

y .

We denote by Dx f the set of sub-differentials of f at x. In the same way, we say


that p Tx M is a super-differential for f at x if there is a function : R
which is differentiable at x with Dx = p such that
f (x) = (x) and f (y) (y)

y .

3.3 A Generalized Brenier-McCann Theorem

93

Fig. 3.3 The function is a support function from below for f at x

Fig. 3.4 The function x |x|

We denote by Dx+ f the set of super-differentials of f at x.


Remark 3.9 If f : R is differentiable at x , then Dx f = Dx+ f = {Dx f }.
Remark 3.10 The sub-differential and/or the super-differential may not be a singleton. It could be empty or contain several sub-differentials.
For example, the sub-differential of the function x |x| at the origin is the
interval [1, 1] while its super-differential is empty (Fig. 3.4).
By (3.15), for every (x, y) c there is a link between the super-differentials of
at x and the sub-differentials of the cost c at (x, y). This lead us to the following
definition which will be satisfied by variational costs.
Definition 3.10 We say that the cost c satisfies the sub-TWIST condition if




Dx c , y1 Dx c , y2 =

y1 = y2 M, x M,

where Dx (, yi ) denotes the sub-differential of the function x c(x, yi ) at x.


The following result makes the sub-TWIST condition relevant.
Lemma 3.11 Assume that the cost c satisfies the sub-TWIST condition. Let (, c )
be a c-pair of potentials and x M be such that has a non-empty super-differential
at x. Then c (x) is a singleton.

94

3 Introduction to Optimal Transport

Proof Argue by contradiction and assume that y1 = y2 both belong to c (x). Then
we have
z M.
c (yi ) = (x) + c(x, yi ) (z) + c(z, yi )
Thus, for every i = 1, 2,
c(z, yi ) (z) + (x) + c(x, yi ),
with equality at z = x. Since is super-differentiable at x, we infer that both
functions z c(z, y1 ) and z c(z, y2 ) share a common sub-differentiable at x.
This contradicts the sub-TWIST condition.

By Theorem 3.9 and Lemma 3.11, in order to prove the existence and uniqueness
of optimal transport maps from a compactly supported probability measure to
another one , it is sufficient to show that the super-differential of the potential
is non-empty for -almost every point in M. Such a property can be obtained
thanks to Rademachers Theorem. We recall that a function defined on a smooth
manifold is called Lipschitz in charts if it is Lipschitz in a set of local coordinates
in a neighborhood of any point. The Rademacher Theorem asserts that any function
which is Lipschitz in charts on an open subset of M is differentiable almost
everywhere in .
Theorem 3.12 Let c : M M [0, +) be a cost which is Lipschitz in charts
and satisfies the sub-TWIST condition. Let , be two probability measures with
compact support on M. Assume that is absolutely continuous with respect to the
Lebesgue measure. Then there is existence and uniqueness of an optimal transport
map from to . In fact, there is a c-convex function : M R which is Lipschitz
in charts such that


c (x) = T (x)

a.e. x M.

(3.19)

Proof By Theorems 3.3 and 3.7 there is a c-cyclically monotone compact set S
Supp()Supp() which contains the supports of all optimal plans between and
together with a c-pair of real-valued continuous potentials (, c ) such that (3.15)
(3.17) are satisfied. In a neighborhood of each x M, the function is the maximum
of a family of functions x 2 (S) c (y) c(x, y) with y 2 (S ) which are
uniformly Lipschitz (in charts) in the x variable . Therefore, is Lipschitz in charts
on M. Since is assumed to be absolutely continuous with respect to the Lebesgue
measure, Rademachers Theorem implies that is differentiable and a fortiori superdifferentiable -a.e. We conclude easily by Theorem 3.9 and Lemma 3.11.

Example 3.7 (Breniers Theorem) Let M = Rn and c : Rn Rn [0, +) be


the quadratic Euclidean cost or Brenier cost defined by c(x, y) = |y x|2 /2 for
any x, y Rn . Remembering Example 3.6, we know that c-convex functions are
the functions : Rn R {+} such that the function + | |2 /2 is convex.
Furthermore, c satisfies the sub-TWIST condition. As a matter of fact, it is smooth

3.3 A Generalized Brenier-McCann Theorem

95

and its partial derivative with respect to the x variable is given by


c
(x, y) = x y
x

x, y Rn .

Therefore y1 = y2 Dx c(, y1 ) = Dx c(, y2 ). By Theorem 3.12, given a pair of


compactly supported probability measures , in Rn with absolutely continuous
with respect to the Lebesgue measure, there is a unique optimal transport map T :
M M from to satisfying (3.19) where : Rn R is a locally Lipschitz
c-convex function. Note that for every x Rn where is differentiable at x, we have
y c (x) = (x) + c(x, y) (z) + c(z, y) z Rn ,
which means that the derivative of the function z (z)+c(z, y) vanishes at z = x,
that is y = x + x . Setting (x) := (x) + |x|2 /2 for every x M, we obtain a
convex function such that
T (x) = x

a.e x Rn .

In other terms, the unique optimal transport map from to is given by the gradient
of a convex function.
Example 3.8 Let M = Rn , note that the Monge cost c : Rn Rn [0, +) given
by c(x, y) = |y x| (cf. Examples 3.2, 3.5) is Lipschitz but does not satisfy the
sub-TWIST condition. As a matter of fact, we have
xy
c
(x, y) =
x
|x y|

x = y Rn .

This means that Dx c(, y1 ) = Dx c(, y2 ) for any y1 , y2 such that y1 x and y2 x
are positively colinear. Hence Theorem 3.15 do not apply. In fact, we already saw
through Example 3.2 that uniqueness of optimal transport maps does not hold in this
context.
Example 3.9 (McCanns Theorem) Let (M, g) be a complete Riemannian manifold.
The geodesic distance dg is Lipschitz in charts on M M. Define the quadratic
geodesic cost or McCanns cost c : M M [0, +) by
c(x, y) :=

1 2
d (x, y)
2 g

x, y M.

Then c is Lipschitz in charts on M M and satisfies the sub-TWIST condition. As


a matter of fact, given x M and p Tx M in Dx c(, y) for some y M, there is a
function : M R which is differentiable at x with Dx = p such that
1 2
1 2
dg (x, y) = (x) and
d (z, y) (z)
2
2 g

z M.

96

3 Introduction to Optimal Transport

Then we argue as in the proof of Lemma 2.15. If we denote by : [0, 1] M a


minimizing geodesic from y to x, then we obtain that for every curve : [0, 1] M
with (0) = y,


1
energyg ( ) (1) 0,
2
with equality for = . As in Lemma 2.15, we infer that there is a unique minimizing
geodesic between x and y and that


y = expx Dx = expx (p),
in
where expx : Tx M M stands for the exponential map which was defined

g 
Sect. 2.3 (if we use the Riemannian exponential map, we have y = expx x ).
The point y is uniquely determined by p, then c satisfies the sub-TWIST condition.
Moreover we note that if a potential : M R is (super-) differentiable at x M
and y c (x), then
c(z, y) (z) + (x) + c(x, y)

z M,

with equality at z = x. Then arguing as above, we deduce that for every pair of
compactly supported probability measures , on M with absolutely continuous
with respect to the Lebesgue measure, there is a unique optimal transport map T
from to satisfying (3.26) where : M R is a c-convex function which is
Lipschitz in charts. By the above discussion, we have


T (x) = expx Dx

a.e x M

(3.20)

and for -a.e. x M there is a unique minimizing geodesic from x to T (x).


Let M be a smooth connected manifold equipped with a complete sub-Riemannian
structure (, g) and whose sub-Riemannian distance is denoted by dSR . In the
Sect. 3.4 our purpose is now to study the Monge problem for the sub-Riemannian
quadratic cost, that is for the cost c : M M [0, +) defined by
c(x, y) :=

1
dSR (x, y)2
2

x, y M.

As we saw before, in order to obtain existence and uniqueness results for optimal
transport maps, it is convenient to be able to show that super-differentials of potentials
are non-empty almost everywhere and that some sub-TWIST condition is satisfied
by the cost function. The sub-TWIST condition follows immediately from Lemma
2.15. So we just have to deal with regularity issues of c-convex functions. In the case
of compactly supported probability measures, regularity properties of Kantorovitch
potentials can be obtained from the regularity of the cost. We develop this approach

3.3 A Generalized Brenier-McCann Theorem

97

in the Sect. 3.4 by showing that under additional assumptions the sub-Riemannian
distance is Lipschitz and even locally semiconcave outside the diagonal.
Remark 3.11 As explained above, if M equipped with a SR structure for which the
2 is Lipschitz on M M, then for every pair of compactly supported
cost c = dSR
probability measures , on M with absolutely continuous with respect to the
Lebesgue measure, there is a unique optimal transport map T from to which can
be expressed as


T (x) = expx Dx

a.e x M,

(3.21)

where : M R is a c-convex function which is Lipschitz in charts.

3.4 Optimal Transport on Ideal and Lipschitz SR Structures


Ideal SR structures. Let (, g) be a sub-Riemannian structure of rank m n on M.
We call it ideal if it is complete and has no non-trivial minimizing singular curves.
We recall that this implies that for every x = y M, any minimizing geodesic
: [0, 1] M joining x to y is regular. By the results of the Chap. 2, all minimizing geodesics are smooth and projections of normal extremals of the Hamiltonian
geodesic equation. We recall that D denotes the diagonal of M M, that is, the set
of all pairs of the form (x, x) with x M. Sub-Riemannian distances of ideal SR
structures are locally semiconcave outside the diagonal.
A function f : R, defined on the open set M, is called locally
semiconcave on if for every x there exist a neighborhood x of x and a
smooth diffeomorphism x : x x (x ) Rn such that f x1 is locally
semiconcave on the open subset x = x (x ) Rn . By the way, we recall that the
function f : R, defined on the open set Rn , is locally semiconcave on
if for every x there exist C, > 0 such that


f (y) + (1 )f (x) f x + (1 )y
(1 )C|x y|2

 
[0, 1], x, y B x , .

This is equivalent to say that the function f can be written locally as




f (x) = f (x) C|x|2 + C|x|2

 
x B x , ,

with f (x) C|x|2 concave, that is as the sum of a concave function and a smooth
function. Note that every locally semiconcave function is locally Lipschitz on its
domain, and thus, by Rademachers Theorem, it is differentiable almost everywhere
on its domain.

98

3 Introduction to Optimal Transport

Fig. 3.5 Graph of a semiconcave function

The following result is useful to prove the local semiconcavity of a given function
(Fig. 3.5).
Lemma 3.13 Let f : R be a function defined on an open set Rn . Assume
that for every x there exist a neighborhood V of x and a positive real
number such that, for every x V , there is px Rn such that
u(y) u(x) + px , y x + |y x|2

y V .

Then the function u is locally semiconcave on .


Proof (Proof of Lemma 3.13) Let x be fixed and V be the neighborhood given
by assumption. Without loss of generality, we can assume that V is an open ball
B. Let x, y B and [0, 1]. The point x := x + (1 )y belongs to B. By
assumption, there exists p Rn such that
u(z) u(x ) + p, z x + |z x |2

z B.

Hence we easily get


u(y) + (1 )u(x) u(x ) + |x x |2 + (1 ) |y x |2


u(x ) + (1 )2 + (1 )2 |x y|2
u(x ) + 2(1 ) |x y|2 ,
and the conclusion follows.

Remark 3.12 Thanks to Lemma 3.13, a way to prove that a given function f :
R is locally semiconcave on is to show that for every x we can put a C 2
support function on the graph of u at x with a uniform control of the C 2 norm of .
Outside the diagonal, sub-Riemannian distances of ideal SR structures enjoy the
same kind of regularity as Riemannian distances.
Theorem 3.14 Let (, g) be an ideal sub-Riemannian structure on M. Then the
SR distance is continuous on M M and locally semiconcave on M M \ D. In
particular, dSR is Lipschitz in charts on M M \ D.

3.4 Optimal Transport on Ideal and Lipschitz SR Structures

99

Proof The continuity of dSR follows from Proposition 1.13. To prove the local semiconcavity, we proceed as explained in Remark 3.12. Let us fix (x, y) M M \D and
x,1
be a minimizing geodesic joining x to y. There is an open neighborhood

V of ([0, 1]) in M and an orthonormal family F (with respect to the metric g) of
m smooth vector fields X 1 , . . . , X m such that

z V .
(z) = Span X 1 (z), . . . , X m (z)
Taking a change of coordinates if necessary, we may assume that V is an open subset
of Rn . Furthermore, there is a control u L 2 ([0, 1]; Rm ) such that
(t) =

m


ui (t)X i ( (t))dt

a.e. t [0, 1].

i=1

Since u is regular, there are v1 , . . . vn in L 2 ([0, 1]; Rm ) such that the linear operator
Rn Rn

m



x,1
vi
i Du EF

i=1

is invertible. Define locally F : Rn Rn by


F : Rn Rn Rn Rn

z,1

(z, ) z, EF

u +

m


i vi

i=1

This mapping is well-defined and C 2 in a neighborhood of (x, 0). Moreover it satisfies


F (x, 0) = (x, y),
and its differential at (x, 0) is invertible. Hence by the Inverse Function Theorem,
there are an open ball B centered at (x, y) in Rn Rn and a function G : B Rn Rn
of class C 2 such that
F G (z, w) = (z, w)

(z, w) B.

Denote by 1 the second component of G . From the definition of the subRiemannian energy between two points, we infer that for any (z, w) B we have

m 

 1
(z, w) vi .
eSR (z, w) u +

i
i=1

L2

100

Set

3 Introduction to Optimal Transport

m 

1 (z, w)
x,y (z, w) := u +

i
i=1

(z, w) B.

L2

We conclude that, there is a function x,y of class C 2 such that dSR (z, w) x,y (z, w)
for any (z, w) in a neighborhood of (x, y) in M M, and dSR (x, y) = x,y (x, y). By
compactness, the C 2 norms of the functions x,y are uniformly bounded. As a matter
of fact, from Remark 2.2 we know that the set of minimizing geodesics from x to y is
compact with respect to the uniform topology; any sequence of minimizing geodesics
{k }k from xk to yk converges uniformly to a minimizing geodesic from x to y. We
also know (see Remark 2.1) that if we cover the set of minimizing curves from x to
y by a finite number of open tubes admitting orthonormal frames, then minimizing

control converge in L 2 . We conclude easily.


Remark 3.13 The above arguments can be used to prove the following result. Let
(, g) be a sub-Riemannian structure of rank m < n on M. Assume that it is complete
and that there is an open set M M such that for every (x, y) with x = y,
any minimizing geodesic between x and y is regular. Then dSR is locally semiconcave
on \ D.
Remark 3.14 Any SR structure of rank m = n, that is any Riemannian structure on
M is ideal, see Remarks 1.9, 2.3.
Lipschitz SR structures. Let (, g) be a sub-Riemannian structure of rank m < n
on M. We call it Lipschitz if it is complete and if the sub-Riemannian distance
function is Lipschitz in charts on M M outside the diagonal (or equivalently if the
sub-Riemannian energy is Lipschitz in charts on M M \ D). A particular case of
Lipschitz SR structures is given by ideal SR structures. The aim of the present section
is to provide a weaker sufficient condition for a complete SR structure to be Lipschitz.
According to Theorem 2.22, a horizontal path : [0, 1] M will be called a Goh
path if it admits an abnormal lift : [0, 1] which annihilates [, ], that is,
an abnormal lift = ( , p) : [0, 1] T M (in local coordinates, see Proposition
1.11 and the subsequent remarks) such that for every local parametrization of by
smooth vector fields X 1 , . . . , X m in a neighborhood of ([0, 1]), we have



p(t) X i , X j (t) = 0

t [0, 1], i, j = 1, . . . , m.

Of course, the above definition does not depend upon the parametrization.
Theorem 3.15 Let (, g) be a complete sub-Riemannian structure on M, assume
that any sub-Riemannian minimizing geodesic joining two distinct points in M is not
a Goh path. Then, the SR structure (, g) is Lipschitz.
x,1
Proof Let us fix (x, y) M M \D and
a minimizing geodesic joining x to
y. As before, denote by F = {X 1 , . . . , X m } an orthonormal family of vector fieldsd

3.4 Optimal Transport on Ideal and Lipschitz SR Structures

101

along ([0, 1]) and by u the control associated with . Two cases may appear:
First case: u := u is not singular.
Then by the arguments given in the proofs of Lemma 2.18 and Theorem 3.14, there
are , K > 0 such that
eSR (x, z) eSR (x, y) + K|z y|

z B(y, ).

Since any control


whichis close enough to u is regular, there is > 0 such that for

every u L 2 [0, 1]; Rm satisfying

u u

L2



x,1
< , eSR x, EF
(u) = uL2 ,

there holds






x,1
x,1
(u) + 2K z EF
(u) ,
eSR (x, z) eSR x, EF

(3.22)



x,1
for every z B EF
(u), /2 .
Second case: u = u is singular.
By Theorem 2.20, we have necessarily


x,1
ind Du2 EF


x,1
|Ker(Du EF
)

= +,

(3.23)



x,1
for all Im Du EF
\ {0}. Recall that C : L 2 ([0, 1]; Rm ) is defined by


u L 2 [0, 1]; Rm .

C(u) := u2L2

Let E0 L 2 ([0, 1]; Rm ) be a vector space such that






x,1
= L 2 [0, 1]; Rm .
E0 + Ker Du EF
Set







x,1
x,1
Ker (Du C) and F := EF
E := E0 Ker Du EF

|{u}+E

x,1
By construction, Du EF
and Du F have the same image in Rn and E0 has finite
dimension. Then by (3.23), we have



ind Du2 F

|Ker(Du F)


= +,

102

3 Introduction to Optimal Transport

for all Im (Du F) \ {0}. We can apply Theorem B.4 to the function

 F.Hence
there are c > 0, (0, 1) such that for every (0, ) and every z B F u , c2 ,
there are w1 , w2 L 2 ([0, 1]; Rm ) such that


z = F u + w1 + w2

(3.24)




w1 Ker Du F , w1 L2 < , w2 L2 < 2 .

(3.25)

and



Let z B(y, c2 ) with |z y| = c2 /2. Then there are w1 , w2 L 2 [0, 1]; Rm such
that (3.24)(3.25) are satisfied. Set u := u + w1 + w2 . Then we have
x,1
(u),
z = EF

and (note that Ker(Du F) Ker(Du C)),


2
 


eSR (x, z) C(u) C u + Du C w1 + w2 + w1 + w2 L2

2
= eSR (x, y) + Du C w2 + w1 + w2 L2

2

eSR (x, y) + 2u L2 2 + + 2

4u L2 + 8
eSR (x, y) +
|z y|.
c
Proceeding as in the proof of Theorem B.4, we can show that the above estimate
holds in a neighborhood of u , that is (taking c > 0, (0, 1) smaller if necessary)
for every (0, ), for every u L 2 [0, 1]; Rm , and every z Rn with

u u

L2





x,1
< , z EF
(u) < c 2 ,

there are w1 , w2 L 2 ([0, 1]; Rm ) such that



x,1 
u + w1 + w2
z = EF
and



w1 Ker Du C , w1 L2 < , w2 L2 < 2 .


This shows that for every u L 2 [0, 1]; Rm satisfying

u u

L2



x,1
< , eSR x, EF
(u) = uL2 ,

3.4 Optimal Transport on Ideal and Lipschitz SR Structures

103

there holds


4u 2 + 8  


x,1
x,1
L
(u) +
eSR (x, z) eSR x, EF
z EF (u) ,
c

(3.26)



x,1
for every z B EF
(u), c /4 .
Let us explain how to conclude by compactness. Let x M and B a compact set
in M such that {x} B D = be fixed. Denote by S the set of all y B such
that there is at least one singular minimizing geodesic between x and y. The set S is
a compact subset of B, and the set of singular minimizing geodesic between x and
a point in S is compact with respect to the uniform topology. Then by the previous
observation (second case) together with a compactness argument (see Remarks 2.1,
2.2), we infer that an inequality of the form (3.26) holds for any minimizing control u
which is close enough to a control corresponding to a singular minimizing geodesic
joining x to a point in S . Denote by S the set of y in B corresponding to such
controls. By construction, any minimizing geodesic from x to a point in B \ S is
regular. Actually, it is far from being singular. Then by the arguments given in the first
case together with compactness arguments, an inequality of the form (3.22) holds
x,1
(u)) in B \ S . In that way, we prove that eSR (x, ) (or equivalently
for any y (= EF
dSR (x, )) is locally Lipschitz in M \ {x}. The same proof shows that eSR is indeed
uniformly locally Lipschitz with respect to one variable. We conclude easily.

Remark 3.15 The above arguments can be used to prove the following result. Let
(, g) be a sub-Riemannian structure of rank m < n on M. Assume that it is complete
and that there is an open set M M such that for every (x, y) with x = y,
no minimizing geodesic between x and y is a Goh path. Then dSR is Lipschitz in
charts on \ D.
Remark 3.16 Note that if the path is constant on [0, 1], it is a Goh path if and only
if there is a differential form p T(0) M satisfying

p X i ( (0)) = p X i , X j ( (0)) = 0

i, j = 1, . . . , m,

where X 1 , . . . , X m is as above a parametrization of in a neighborhood of (0).


The above proof shows that if is 2-generating then eSR is Lipschitz in charts on
M M.
Remark 3.17 If a SR structure (, g) on M is Lipschitz, then for every x M, the
exponential mapping expx is onto. In fact, for every y there is a minimizing geodesic
joining x to y which is normal. This can be shown by the arguments which were
given at the end of the proof of Theorem 2.14.
A Brenier-McCann Theorem on Lipschitz SR structures. Before stating our existence and uniqueness result for Lipschitz SR structures, we introduce a definition.

104

3 Introduction to Optimal Transport

Definition 3.16 Given a c-convex function : M R, we call moving set M


and static set S respectively the sets defined as follows:

M := x M | x c (x) ,

S := M \ M = x M | x c (x) .
As shown by the following result, under classical assumptions on the measures
and Lipschitzness of the sub-Riemannian structure, static points do not move while
moving points obey a transportation law of the form (3.20)(3.21).
Theorem 3.17 Let (, g) be a Lipschitz sub-Riemannian structure on M and ,
be two compactly supported probability measures on M. Assume that is absolutely
continuous with respect to the Lebesgue measure. Then there is existence and
uniqueness of an optimal transport map from to for the SR quadratic cost
c : M M [0, +) defined by
c(x, y) :=

1 2
d (x, y)
2 SR

x, y M.

In fact, there is a continuous c-convex function : M R such that the following


holds:
(i) M is open, and is Lipschitz in charts on M . In particular is differentiable
-a.e. in M .
(ii) For -a.e, x S , c (x) = {x}.
In particular, there exists a unique optimal transport map defined -a.e. by

T (x) :=

expx (Dx ) if x M ,
x
if x S ,

and for -a.e. x M there exists a unique minimizing geodesic between x and T (x).
Proof Let S Supp() Supp() and (, c ) be respectively the c-cyclically
monotone set and the c-pair of potentials satisfying (3.15)(3.17). Since the sets
Supp(), Supp() are assumed to be compact, both , c are indeed continuous
and the supremum and infimum in (3.15)(3.16) are attained. We check easily that
x M belongs to S if and only if (x) = c (x). Then M coincides with the
set

x M | (x) = c (x) = x M | (x) > c (x) ,


which is open by continuity of and c . Let us now prove that is Lipschitz in
charts in an open neighborhood of M Supp(). Let x M be fixed. Since
x c (x) and c (x) is closed in M (by continuity of , c and compactness of

3.4 Optimal Transport on Ideal and Lipschitz SR Structures

105

S ), there is r > 0 such that dSR (x, y) > 2r for any y c (x). In addition, since
the set c is closed in M M (again by continuity of , c and compactness of
S ), there exists a neighborhood Vx of x which is included in M such that
dSR (z, w) r

z Vx , w c (z).

Let x,r : M R be the function defined by

1 2
x,r (z) := sup c (y) dSR
(z, y) | y 2 (S ), dSR (z, y) r .
2
By construction, coincides with x,r on Vx . By assumption, dSR is Lipschitz
in charts outside the diagonal, then by compactness of S we deduce that x,r is
Lipschitz in charts. In conclusion  is Lipschitz in charts on M and (i) is proved.
To prove (ii), we observe that it suffices to prove the result for x belonging to
an open set V M on which the horizontal distribution (x) is parametrized by a
orthonormal family a smooth vector fields F = {X 1 , . . . , X m }. In fact, up to working
in charts, we can assume that V is a convex subset of Rn where the C 2 -norms of
the X i s are bounded. Let us fix a compact ball B in V and show that (ii) holds for
-a.e. x B.
Recall that the Hamiltonian H : V (Rn ) R which is associated to our
sub-Riemannian structure is defined by (see Chap. 2)
2
1 
p X i (x)
2
m

H(x, p) :=

 
(x, p) V Rn .

i=1

For every p (Rn ) \ {0}, denote by p the linear hyperplane in Rn which is


orthogonal to p, that is

p := v Rn | p v = 0 .
From Lemma 2.11 and its proof, for every x V and every p (Rn ) with H (x , p ) =
0, there is > 0 such that the Dirichlet problem


 
H(x, Dx S(x)) = H x , p ,
S|x+p = 0,

(3.27)

admits a solution of class C 1 on the ball B(x , ). We leave the reader to check that
the radius depends continuously on x , H(x , p) and |p| (|p| denotes the Euclidean
norm of p). Then, by compactness of B there is a function
: (0, +) (0, +) (0, )
which is decreasing in the first variable and increasing in the second variable such
that for every x B and every p (Rn ) with H (x , p ) = 0, the solution to (3.27) is

106

3 Introduction to Optimal Transport

Fig. 3.6 Characteristics of Sx ,p

defined on the open ball B (x , (H(x , p ), |p|)). For any x , p satisfying the previous
assumptions, we denote by



Sx ,p : B x , H(x , p ), |p| R
the solution to the Dirichlet problem (3.27), with x ,p := (H(x , p ), |p|). The functions Sx ,p being constructed by the method of characteristics (see Proof of Lemma
2.11), the following result holds (note that the parametrization of characteristics that
we use in the statement of Lemma 3.18 differs from the one which is used to construct
Sx ,p , see last statement).
Lemma 3.18 There is a function
: (0, +) (0, +) (0, +)
which is increasing in the first variable and decreasing in the second variable such
that the following property holds (see Fig. 3.6):
n
For every x B, for every p (R ) with H (x , p ) = 0, and every x
B x , x ,p /2 there are




zx ,p (x) x + p B x , x ,p
 
 

and tx ,p (x) H(x , p ), |p| , H(x , p ), |p|
such that



x = x ,p tx ,p (x); zx ,p (x)



where (we set x , p := H(x , p ), |p| )

3.4 Optimal Transport on Ideal and Lipschitz SR Structures

107





 

 
x ,p ; zx ,p (x) , px ,p ; zx ,p (x) : x ,p , x ,p V Rn
is the solution to the Hamiltonian system




 H 



x ,p t; zx ,p (x) =
x ,p t; zx ,p (x) , px ,p t; zx ,p (x)
p








p x ,p t; zx ,p (x) = H x ,p t; zx ,p (x) , px ,p t; zx ,p (x) ,


x
with




x ,p 0; zx ,p (x) = zx ,p (x) and px ,p 0; zx ,p (x) = p .
In particular, x ,p is an horizontal path joining zx ,p (x) to x which satisfies







H x ,p t; zx ,p (x) , px ,p t; zx ,p (x) = H zx ,p (x), p



t x ,p , x ,p .

For every x V , we denote by (x) the set of p (Rn ) such that H(x, p) = 0.
Pick a sequence {(xk , pk )}k of B (Rn ) which is a dense subset of

 
(x, p) B Rn | p (x) .
and set for every k,
k := xk ,pk , k := xk ,pk , tk () := txk ,pk (),
zk () := zxk ,pk (), k (, ) := xk ,pk (, ), pk (, ) := pxk ,pk (, ).
The following result is a consequence of the Lipschitz regularity of the subRiemannian distance along horizontal paths together with Rademachers theorem.
Lemma 3.19 There is a set N of Lebesgue measure zero in V such that for every
x B \ N and any k, the following property holds:




x B xk , k /2 and x = k t; zk (x)
 

= s k s; zk (x) is differentiable at t.
Proof (Proof
3.19) Let k be fixed. By construction, all the curves k (; z)

 of Lemma
(with z x + pk B (xk , k )) are horizontal with respect to the distribution (we
may assume
 withoutloss of generality that the curves k (; z) are defined on (k , k )
for all z x + pk B (xk , k )). The potential is expressed as

1 2
(x, y) | y 2 (S )
(x) = max c (y) dSR
2

x M,

108

3 Introduction to Optimal Transport

with c continuous and 2 (S ) compact. Hence, given s (k , k ), there is y


2 (S ) such that
  
  1 2    
k s ; z = c y dSR
k s ; z , y .
2
Then we have for every s (k , k ),
  1 2
(k (s; z)) c y dSR
(k (s; z), y )
2



 
   
2
2
c y dSR
k (s; z), k,l s ; z dSR
k s ; z , y
2


  
k s ; z 2H z, pk s s 


  

k s ; z 4k H z, pk s s .
This shows that each function s (k (s; z)) is locally Lipschitz on its domain.
By Rademachers theorem, we infer that it is
almost everywhere on

 differentiable
(k , k ). Since the paths k (; z) with z xk + pk B (xk , k ) laminate a set
which is bigger than the ball B(xk , k /2) in a continuous way, Fubinis theorem
implies the existence of a negligeable set Nk,l such that the property stated in the

lemma holds for k. We conclude by setting N = Nk .


Before starting the proof of (ii), we need a last result giving an estimates
 on
the
deviation
of
normal
geodesics.
For
every
(x,
p),
we
denote
by

:=
,
p
x,p
x,p


x,p (; x), px,p (; x) , the solution of the Hamiltonian system starting at (x, p); it is
defined on the interval ( (H(x, p), |p|), (H(x, p), |p|)).
Lemma 3.20 There is a function
C : (0, +) (0, +) (0, +)
which is decreasing in the first variable and increasing in the second variable such
that the following property holds:
For every h, R > 0, every k, and every (x, p) B (Rn ) satisfying
 




H xk , pk , H(x, p) > h, pk , |p| < R, x B xk , k /2 ,

(3.28)

 


 


k tk (x) + s; zk (x) x,p (s) C(h, R) pk tk (x); zk (x) p s,

(3.29)

one has

for every s ( (h, R), (h, R)) (tk (x) (h, R), tk (x) + (h, R)).
Proof (Proof of Lemma 3.20) Since the C 1 -norms of the X i s are bounded on V ,
there is an increasing function P : (0, +) (0, +) such that the solutions to
our Hamiltonian system starting from a pair (x, p) with x B, H(x, p) > h and

3.4 Optimal Transport on Ideal and Lipschitz SR Structures

109

|p| < R remains in the set V B(0, P(R)) on the interval ( (h, R), (h, R)) (note
that since H is constant along the Hamiltonian trajectories, the solutions remains
in the set {H(x, p) > h}). Now, considering Lipschitz constants of the Hamiltonian
vector field on the cylinder V B(0, P(R)) (the C 2 -norms of the X i s are bounded
on V ) and using Gronwalls Lemma (see Appendix A), we prove easily the existence
of an increasing function C : [0, +) [0, +) such that
  

 


k tk (x) + s; zk (x) x,p (s) + pk tk (x) + s; zk (x) px,p (s)
 


C(R) pk tk (x); zk (x) p ,

(3.30)

for every h, R > 0, every k, and every (x, p) B (Rn ) satisfying (3.28), and
every s ( (h, R), (h, R))(tk (x) (h, R), tk (x) + (h, R)). Let us denote
by I the latter interval and set

 

u(s) := k tk (x) + s; zk (x) x,p (s)

s I.

Considering again the Lipschitz constants of the Hamiltonian vector field that we
always denote by K, we obtain formally for every s,


u(s) = 


 
 H 
 
H  
k tk (x) + r; zk (x) , pk tk (x) + r; zk (x)
x,p (r), px,p (r) dr 
p
0 p
 s
 s
 

pk tk (x) + r; zk (x) px,p (r) dr,
K
u(r) dr + K
s

which by (3.30) gives



u(s) K


u(r) dr + K

 


C(R) pk tk (x); zk (x) p dr.

Gronwalls Lemma (see Lemma A.1) concludes the proof.

We are now ready to prove that for every x B \ N, we have c (x) = {x}. Fix
x B \ N and argue by contradiction, that is assume that there is y = x such that
y c (x) \ {x}. Then we have (remembering Remark 3.6)
 
 
(x) + c x, y (z) + c z, y

z M,

which can be written as


(x) (z)

1 2   1 2  
d z, y dSR x, y
2 SR
2

z M.

(3.31)

Since dSR is Lipschitz in charts outside the diagonal, there is a normal minimizing
geodesic joining x to y (see Remark 3.17), that is there is p Tx M such that
expx (p) = y and dSR (x, y)2 = 2H(x, p) = 0. Note that since x belongs to c (x),
we have

110

3 Introduction to Optimal Transport

(x) = (x) + c(x, x) (z) + c(z, x)

z V .

Set h := H(x, p)/2, R := 2|p| and pick k such that


H (xk , pk ) > h, |pk | , |p| < R, x B (xk , k /2) .
Applying the previous inequality with z = k (tk (x) + s; zk (x)) and s small yields

 
k tk (x); zk (x) = (x)
 
 1 2  
 
k tk (x) + s; zk (x) + dSR
k tk (x) + s; zk (x) , x
2

 
k tk (x) + s; zk (x) + H (zk (x), pk ) s2 ,
path joining x = k (tk (x); zk (x)) to the point
because k (; zk (x)) is an horizontal

k (tk (x) + s; zk (x)) of length s 2H (zk (x), pk ). Since x does not belong to N, the
function

 
s k tk (x) + s; zk (x)
is differentiable at s = 0. Then Lemma 3.19 together with the previous inequality
allows us to write

d  
k tk (x) + s; zk (x) |s=0 = 0.
ds

(3.32)

Since dSR is Lipschitz outside the diagonal and y = x, there are , K > 0 such that
  
 
 2
2
z , y  K|z z|
dSR z, y dSR

z, z B(x, ).



Then applying (3.31) with z = k tk (x) + s; zk (x) and s small and using (3.29)
yields (as in Lemma 3.20, x,p denotes the geodesic starting at x with initial covector
p, note that x,p (s) belongs to V for small s)

 
(x) k tk (x) + s; zk (x)
  1 2
1 2  
dSR
k tk (x) + s; zk (x) , y dSR
(x, y )
2
2
 1 2 

 1 2
K 
k tk (x) + s; zk (x) x,p (s) + dSR
x,p (s), y dSR
(x, y )
2
2
2

  1 2  

1
KC(h, R)  
2
pk tk (x); zk (x) p s + (1 s)2 dSR

x, y dSR x, y
2
2
2


2 (x, y)




d
KC(h, R)  
2
s2 .
pk tk (x); zk (x) p dSR
=
x, y s + SR
2
2

3.4 Optimal Transport on Ideal and Lipschitz SR Structures

111

The quantity


KC(h, R)  
pk tk (x); zk (x) p
2
tends to 0 as (xk , pk ) tends to (x, p). We infer
) close enough to (x, p),
 that for (xk , pk
the derivative of the function s k tk (x) + s; zk (x) cannot be zero. This
contradicts (3.32).
It remains to prove the formula for T (x) and the uniqueness of minimizing geodesic between x and T (x) -almost everywhere. We need to show that

c (x) Supp() = expx

1
Dx
2

for all x M Supp() where is differentiable, which is the case for -almost
every x M by assertion (i) and Rademachers theorem. This is a consequence
of Lemma 2.15 applied to the function z (z) + c (y) at the point x with
y c (x). Moreover, again by Lemma 2.15, the geodesic from x to T (x) is unique
for -a.e. x M Supp(). Since T (x) = x for x S Supp(), the geodesic
is clearly unique also in this case.

Remark 3.18 If the sub-Riemannian structure is assumed to be ideal, then the potential can be shown to be locally semiconcave on the moving set.
Remark 3.19 The above arguments show that Theorem 3.17 remains true under
more general assumptions. Let (, g) be a complete sub-Riemannian structure on M
and , be two compactly supported probability measures in M with absolutely
continuous with respect to the Lebesgue measure. Assume that there are two open
sets 1 , 2 M with
(M \ 1 ) = 0

and

Supp() 2

such that the sub-Riemannian distance is Lipschitz in charts on (1 2 ) \ D.


Then there is existence and uniqueness of an optimal transport map with respect to
the sub-Riemannian quadratic cost.

3.5 Back to Examples


We conclude the present chapter with a list of examples for which we have existence
and uniqueness of optimal transport maps for the SR quadratic cost, that is the cost
c : M M [0, +) defined by
c(x, y) :=

1 2
d (x, y)
2 SR

x, y M.

112

3 Introduction to Optimal Transport

Given a cost function, we shall say that the Monge problem is well-posed, if we have
existence and uniqueness of optimal transport maps from an absolutely continuous
compactly supported measure to a compactly supported measure. All the examples
that we review below have already been encoutenred within the text.
Fat distributions. Recall (see Example 1.15) that a distribution on M is called fat
if, for every x M and every section X of with X(x) = 0, there holds

Tx M = (x) + X, (x),
where

X, (x) := [X, Z](x) | Z section of .

We saw that fat distributions do not admit non-trivial singular horizontal paths. This
means that any complete sub-Riemannian structure associated with a fat distribution
is ideal. In conclusion, by Theorem 3.17, the Monge problem for any sub-Riemannian
structure associated with a fat distributions is well-posed.
Two-generating distributions. A distribution is called two-generating if
Tx M = (x) + [, ](x)

x M.

Two-generating distributions do not admit Goh paths (see Example 2.1). By


Theorem 3.17, the Monge problem for any sub-Riemannian structure associated
with a two-generating distributions is well-posed.
Totally nonholonomic distributions on three-dimensional manifolds. Assume
that M has dimension 3, that is a nonholonomic rank-two distribution on M, and
define

:= x M | (x) + [, ](x) = R3 .
The set is called the singular set or the Martinet set of .
Proposition 3.21 Let be a totally nonholonomic distribution on a threedimensional manifold. Then, the set is a closed subset of M which is countably
2-rectifiable. Moreover, a non-trivial horizontal path : [0, 1] M is singular if
and only if it is included in .
Proof The first part will follow from Proposition 3.22 while the second part has
already been proved in Example 1.17.

Proposition 3.21 implies that for any pair (x, y) M M (with x = y) such that
x or y does not belong to , any sub-Riemannian minimizing geodesic between x
and y is nonsingular. Moreover has Lebesgue measure zero. As a consequence,

3.5 Back to Examples

113

by Remarks 3.13 and 3.19, the Monge problem is well-posed.


Medium-fat distributions. The distribution is called medium-fat if, for every
x M and every vector field X on M such that X(x) (x) \ {0}, there holds
Tx M = (x) + [, ](x) + [X, [, ]](x).
As shown in Example 2.1, medium-fat distributions do not admit non-trivial Goh
paths. As a consequence, the Monge problem for sub-Riemannian structures involving medium-fat distributions is well-posed.
Codimension-one nonholonomic distributions. Let M have dimension n and be a
nonholonomic distribution of rank n1. As in the case of nonholonomic distributions
on three-dimensional manifolds, we can define the singular set associated to the
distribution as

:= x M | (x) + [, ](x) = Tx M .
The following result holds.
Proposition 3.22 If is a nonholonomic distribution of rank n 1, then the set
is a closed subset of M which is countably (n 1)-rectifiable. Moreover, any Goh
path is contained in .
Proof The fact that is a closed subset of M is obvious. Let us prove that it is
countably (n 1)-rectifiable. Since it suffices to prove the result locally, we can
assume that we have

x V ,
(x) = Span X 1 (x), . . . , X n1 (x)
where V is an open neighborhood of the origin in Rn . Moreover, doing a change of
coordinates if necessary, we can also assume that (with coordinates (x1 , . . . , xn ))
X i = xi + i (x) xn

i = 1, . . . , n 1,

where each i : V R is a C function satisfying i (0) = 0. Hence, for any


i, j {1, . . . n 1}, we have
i j
X ,X =
and so

j
i

xi
xj

j
i
i
j
xn
xn

&
xn ,

114

3 Introduction to Optimal Transport





j
j
i
i
+
= x V |

i
j = 0
xi
xj
xn
xn
i, j {1, . . . , n 1}} .
For every tuple I = (i1 , . . . , ik ) {1, . . . , n1}k we denote by X I the smooth vector
field constructed by Lie brackets of X 1 , X 2 , . . . , X n1 as follows,
'

X I = X i1 , X i2 , . . . , X ik1 , X ik . . . .
We call k = length(I) the length of the Lie bracket X I . Since is totally nonholonomic, there is some positive integer r such that

Rn = Span X I (x) | length(I) r

x V .

It is easy to see that, for every I such that length(I) 2, there is a smooth function
gI : V R such that
x V .
X I (x) = gI (x)xn
Defining the sets Ak as

Ak := x V | gI (x) = 0 I such that length(I) k ,


we have
=

(Ak \ Ak+1 ) .

k=2

By the Implicit Function Theorem, it is easy to see that each set Ak \ Ak+1 can be
covered by a countable union of smooth hypersurfaces. Indeed assume that some
given x belongs to Ak \ Ak+1 . This implies that there is some J = (j1 , . . . , jk+1 ) of
length k + 1 such that gJ (x) = 0. Set I = (j2 , . . . , jk+1 ). Since gI (x) = 0, we have

gJ (x) =
Hence, either

gI
xj1 (x)

= 0 or


gI
gI
(x) +
(x)j1 (x) = 0.
xj1
xn

gI
xn (x)

= 0.

Consequently, we deduce that we have the following inclusion


A \A
k

k+1

length(I)=k



gI
x V | i {1, . . . , n} such that
(x) = 0 .
xi

3.5 Back to Examples

115

We conclude easily.
The fact that any Goh path is contained in is obvious.

As a consequence by Remarks 3.13 and 3.19, the Monge problem for subRiemannian structures involving codimension one distributions is well-posed.
Rank-two distributions in dimension four. Let (M, , g) be a complete subRiemannian manifold of dimension four, and let be a regular rank-two distribution,
that is satisfying
Tx M

'
'

= Span X 1 (x), X 2 (x), X 1 , X 2 (x), X 1 , X 1 , X 2 (x), X 2 , X 1 , X 2 (x)

for any local parametrization F = {X 1 , X 2 } of the distribution. In Example 1.19,


we saw that there is a smooth horizontal vector field X on M such that the singular
horizontal paths parametrized by arc-length are exactly the integral curves of X,
i.e. the curves satisfying
(t) = X( (t)).
For every x M, denote by O(x) the orbit of x by the flow of X and set

:= (x, y) M M | y
/ O(x) .
According to Remark 3.13, the following result holds:
Proposition 3.23 Under the assumption above, the function dSR is locally semiconcave in the interior of .
The above result allow us to obtain existence and uniqueness of optimal transport
maps in certain cases. Let us consider the distribution given in Example 1.18, that is
the distribution in R4 spanned by the vector fields
X 1 = x1 ,

X 2 = x2 + x1 x3 + x3 x4 .

As shown in Example 1.18, an horizontal path : [0, 1] R4 is singular if and


only if it satisfies, up to reparameterization by arc-length,


(t) = X 1 (t)

t [0, 1].

By the above proposition, we deduce that, for any complete metric g on R4 , the
sub-Riemannian distance function dSR is locally semiconcave on the set

/ Span{e1 } ,
= (x, y) R4 R4 | (y x)

116

3 Introduction to Optimal Transport

where e1 denotes the first vector in the canonical basis of R4 . Consequently, for any
pair of compactly supported probability measures , on M such that is absolutely
continuous with respect to the Lebesgue measure and


Supp ,
the Monge problem is well-posed.

3.6 Notes and Comments


In 1781, Monges original work [18] was concerned with the moving of soil that was
modelized as an optimal transport problem consisting in minimizing the transportation cost

|T (x) x| d(x),
(3.33)
R3

between continuous distributions of mass. The Monge problem was rediscovered


several decades later, in 1942, by Kantorovitch [15] who proved a duality theorem
to study the relaxed form of the problem (which is by now referred as Kantorovitch
problem). We refer the reader to the textbook [24] by Villani and references therein
for an historical account on the optimal transport theory.
The Kantorovitch duality theorem which is not precisely stated in the present
monograph appears through Theorem 3.7 and Corollary 3.1. Actually, our presentation of the theory leading to existence and uniqueness of optimal transport maps
closely follows the one of Gangbo and McCann in [13]. For sake of simplicity, we
restrict our attention to transportation problems between compactly supported probability measures from a smooth manifold into itself with continuous costs. Most of
the results of Sects. 3.13.2 remain true in the more general context of lower semicontinuous costs on the product of two Polish spaces and non-compactly supported
probability measures. We refer the reader to Villanis monograph [24] for general
statements.
As seen through Example 3.1, transport maps may not exist. In fact, Pratelli [20]
proved that transport maps do exist as soon as the initial measure is assumed to be
non-atomic. The Prokhrorov Theorem which is used in the proof of Theorem 3.1 can
be found in Billingsleys book [4]. Theorem 3.7 extends a result by Rockafellar [23]
about the sub-differentials of convex functions. The sub-TWIST condition introduced
in Sect. 3.3 is a natural extension of the classical TWIST condition (see [24]). Thanks
to Lemma 2.15, many costs obtained in a variational way do satisfy the sub-TWIST
condition. This is the case of the quadratic Euclidean cost appearing in Example
3.7, or of the quadratic geodesic cost appearing in Example 3.9. In fact, Examples
3.73.9 refer respectively to theorems by Brenier [5] and McCann [16]. This type of
result can be developed further by considering locally Lipschitz costs associated with

3.6 Notes and Comments

117

problems of calculus of variations involving Tonelli Lagrangians (see [3]) or even


with some optimal control problems (see [1]). As seen in Example 3.2, minimizers
of the original Monge problem with cost c(x, y) = |y x| in Rn may not be unique.
However, existence of optimal transport maps can be proved, see [24] and references
therein.
The study of Monge-type problems in sub-Riemannian geometry began with a
paper by Ambrosio and Rigot [2] about the transportation problem in the Heisenberg
group. Then, Agrachev and Lee [1] extended the well-posedness result of AmbrosioRigot to the case of sub-Riemannian quadratic costs which are Lipschitz in charts on
M M (see Remark 3.11). Then, Figalli and the author [12] removed the assumption
of Lipschitzness on the diagonal; this is Theorem 3.17. We observe that our proof
of assertion (ii) differs from the original proof in [12] which was based on a PansuRademacher Theorem. All these results are concerned with SR quadratic costs (that
2 ). As in the Euclidean case, the Monge problem for the non-quadratic
is c = dSR
cost c = dSR does not enjoy uniqueness. Using techniques developed by Champion
and De Pascale [8], De Pascale and Rigot [10] obtained an existence result for the
classical Monge problem in the Heisenberg group.
The local semiconcavity of some SR distances outside the diagonal is demonstrated in Theorem 3.14. Such regularity is fundamental and sometimes necessary.
First, it shows that distances of ideal sub-Riemannian structures share the same type
of properties as Riemannian distances, at least outside the diagonal. It can be useful
to get Sards theorems and as a consequence regularity properties of sub-Riemannian
spheres, see [21]. Then, the semiconcavity of the cost allows to consider probability
measures which do not charge rectifiable sets and hence not necessarily absolutely
continuous, see [24]. Finally, semiconcavity of the cost may be transfered to potentials (see Remark 3.18) and then permit to get a Monge-Ampre-like equation (see
Remark 3.1). This latter consequence is due to a famous theorem by Alexandrov (see
[11] ) which states that locally semiconvace functions are two times differentiable
almost everywhere. We refer the reader to [12] for further details on sub-Riemannian
Monge-Ampre equations, to [6, 22] for further details on semiconcave SR distances,
and to the Cannarsa-Sinestraris book [7] for an detailed exposition on semiconcavity.
Our list of examples already appeared in [12] which indeed contained an additional example about generic sub-Riemannian structures. Chitour, Jean and Trlat
[9] proved that generic SR structures of rank 3 do not admit singular curves. By
Theorems 3.14 and 3.17, this shows that the Monge problem for generic SR structures of rank 3 is well-posed. We refer the reader to [12] and references therein for
further details.
We do not know if the Monge problem (for the SR quadratic cost) is well-posed
for general sub-Riemannian structures. The method presented in this chapter requires
regularity properties for dSR . According to the Mitchell ball-box theorem (see [14,
17, 19]), the sub-Riemannian distance is always locally Hlder in charts. In Chap. 2,
we saw that given a complete sub-Riemannian structure and x M the function
y M dSR (x, y)

118

3 Introduction to Optimal Transport

is Lipschitz in charts on a dense subset of M. We do not know if this set has necessarily
full Lebesgue measure in M (note that the Sard Conjecture that we mentioned in
Sect. 2.6 would imply such a result). Anyway, such a result would not be sufficient to
prove the well-posedness of Monge problem for general sub-Riemannian structures.

References
1. Agrachev, A., Lee, P.: Optimal transportation under nonholonomic constraints. Trans. Amer.
Math. Soc. 361(11), 60196047 (2009)
2. Ambrosio, L., Rigot, S.: Optimal transportation in the Heisenberg group. J. Funct. Anal. 208(2),
261301 (2004)
3. Bernard, P., Buffoni, B.: Optimal mass transportation and Mather theory. J. Eur. Math. Soc.
9(1), 85121 (2007)
4. Billingsley, P.: Convergence of Probability Measures, 2nd edn. John Wiley & Sons Inc., New
York (1999)
5. Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Comm.
Pure Appl. Math. 44, 375417 (1991)
6. Cannarsa, P., Rifford, L.: Semiconcavity results for optimal control problems admitting no
singular minimizing controls. Ann. Inst. H. Poincar Anal. Non Linaire 25(4), 773802 (2008)
7. Cannarsa, P., Sinestrari, C.: Semiconcave functions, Hamilton-Jacobi equations, and optimal control. Progress in Nonlinear Differential Equations and their Applications, vol. 58.
Birkhuser, Boston (2004)
8. Champion, T., De Pascale, L.: The Monge problem in Rd . Duke Math. J. 157(3), 551572
(2011)
9. Chitour, Y., Jean, F., Trlat, E.: Genericity results for singular curves. J. Differ. Geom. 73(1),
4573 (2006)
10. De Pascale, L., Rigot, S.: Monges transport problem in the Heisenberg group. Adv. Calc. Var.
4(2), 195227 (2011)
11. Evans, L.C., Gariepy, R.: Measure Theory and Fine Properties of Functions. CRC Press, Boca
Raton, FL (1992)
12. Figalli, A., Rifford, L.: Mass transportation on sub-Riemannian manifolds. Geom. Funct. Anal.
20(1), 124159 (2010)
13. Gangbo, W., McCann, R.J.: The geometry of optimal transportation. Acta Math. 177(2), 113
161 (1996)
14. Jean, F.: Control of nonholonomic systems and sub-Riemannian geometry. Lectures given at
the CIMPA School Gomtrie sous-riemannienne, Beirut, Lebanon (2012)
15. Kantorovitch, L.: On the translocation of masses. C.R. (Doklady) Acad. Sci. URSS 37, 199201
(1942)
16. McCann, R.J.: Polar factorization of maps on Riemannian manifolds. Geom. Funct. Anal.
11(3), 589608 (2001)
17. Mitchell, J.: On Carnot-Carathodory metrics. J. Differ. Geom. 21(1), 3545 (1985)
18. Monge, G.: Mmoire sur la thorie des dblais et des remblais. Histoire de lAcadmie Royale
des Sciences de Paris, pp. 666704 (1781)
19. Montgomery, R.: A tour of subriemannian geometries, their geodesics and applications. Mathematical Surveys and Monographs, vol. 91. American Mathematical Society, Providence, RI
(2002)
20. Pratelli, A.: On the equality between Monges infimum and Kantorovitchs minimum in optimal
mass transportation. Ann. Inst. H. Poincar Probab. Statist. 43(1), 113 (2007)
21. Rifford, L.: propos des sphres sous-riemanniennes. Bull. Belg. Math. Soc. Simon Stevin
13(3), 521526 (2006)

References

119

22. Rifford, L., Trlat, E.: On the stabilization problem for nonholonomic distributions. J. Eur.
Math. Soc. 11(2), 223255 (2009)
23. Rockafellar, R.T.: Characterization of the subdifferentials of convex functions. Pacific J. Math.
17, 497510 (1966)
24. Villani, C.: Optimal Transport, Old and New. Springer-Verlag, Heidelberg (2008)

Appendix A

Ordinary Differential Equations

We recall here without proofs basic facts on ordinary differential equations. For
further details, we refer the reader to the textbook [1].
A function f : [a, b] Rn is said to be absolutely continuous, if for each > 0,
there exists > 0 such that for each family of disjoints intervals {]ai , bi [}iN included
in [a, b], and satisfying

bi ai < ,
iN

we have


| f (bi ) f (ai )| < .

iN

Any absolutely continuous function is continuous. In fact, a function f : [a, b] Rn


is absolutely continuous if and only if it is differentiable almost everywhere on [a, b],
d
f (t) is integrable with respect to the Lebesgue measure on
its derivative f(t) := dt
[a, b], and we have for each t [a, b],
df
= f (a) +
dt


a

d
f (s)ds t [a, b].
dt

A function f : [a, b] Rn is called absolutely continuous with square integrable


derivative if it is absolutely continuous on [a, b] and satisfies


f L 2 [a, b]; Rn .
Let M be a smooth manifold without boundary of dimension n 2. A function f :
[a, b] M is called absolutely continuous (resp. absolutely continuous with square
integrable derivative) if it is absolutely continuous (resp. absolutely continuous with

L. Rifford, Sub-Riemannian Geometry and Optimal Transport,


SpringerBriefs in Mathematics, DOI: 10.1007/978-3-319-04804-8,
The Author(s) 2014

121

122

Appendix A: Ordinary Differential Equations

square integrable derivative) in charts. Such a notion does not depend on the atlas
chosen to cover M.
The Gronwall lemma is a key tool to obtain estimates involving solutions of
differential equations.
Lemma A.1 (Gronwalls Lemma) Let > 0, : [0, ] : R be a continuous
function, and L 1 ([0; ], R). Assume that u : [0, ] R is a continuous function
satisfying

t

u(t) (t) +

(s)u(s)ds t [0, ].

Then there holds


u(t) (t) + e

t
0

(s)ds

s
0

(r )dr

(s)(s) ds t [0, ].

If, in addition, is nondecreasing, then


u(t) (t)e

t
0

(s)ds

t [0, ].

Let I R be an open interval, be an open subset of Rn , and f : I Rn


be a function satisfying the following property:
(HCP ) For every x , there exist > 0, a locally integrable function c : I
[0, +), and a nondecreasing function : [0, +) [0, +) with
(h) 0 as h 0 such that
| f (t, y) f (t, z)| c(t)(|y z|) and | f (t, y)| c(t)
for almost all t I and all y, z B(x, ).
Given (t0 , x0 ) I , our aim is to solve locally the following Cauchy problem
x(t)
= f (t, x(t)),

a.e. t, x (t0 ) = x0 .

(A.1)

Theorem A.2 (Cauchy-Peanos Theorem) Assume that f : I Rn satisfies


the property (HC P ). Then for every (t0 , x0 ) I , there is > 0 such that the
Cauchy problem (A.1) admits a solution on [t0 , t0 + ].
Remark A.1 The Cauchy-Peano is only an existence result. In the autonomous case,
it says that if f : Rn is continuous then for every x0 , the Cauchy problem
x(t)
= f (x(t)), x(0) = x0 ,
admits at least one solution locally. A counterexample to uniqueness is for example
given by f : R R defined by

Appendix A: Ordinary Differential Equations

f (x) :=

123

|x| x R.

The Cauchy problem x(t)


= f (x(t)), x(0) = 0 admits two smooth solutions:
x(t) = 0 and x(t) =

t2
t R.
4

Let I R be an open interval, be an open subset of Rn , and f : I Rn


be a function satisfying the following property:
(HCC ) For every x , there exist > 0 and a locally integrable function c : I
[0, +) such that
| f (t, y) f (t, z)| c(t)|y z| and | f (t, y)| c(t)
for almost every t I and all y, z B(x, ).
The following result provides existence and uniqueness for the Cauchy problem
(A.1).
Theorem A.3 (Cauchy-Carathodorys Theorem) Assume that f : I Rn
satisfies the property (HCC ). Then for every (t0 , x0 ) I , there is > 0 such
that the Cauchy problem (A.1) admits a solution x : [t0 , t0 + ] . If
y : [t0 , t0 + ] (or y : [t0 , t0 ] ) is an other solution of (A.1), then
x(t) = y(t) for all t [t0 , t0 + ].
Remark A.2 In the autonomous case, the Cauchy-Carathodory Theorem says that
if f : Rn is locally Lipschitz then for every x0 , the Cauchy problem
x(t)
= f (x(t)), x(0) = x0 ,
admits a solution locally and this solution is unique.
By the Cauchy-Carathodory Theorem, under assumption (HCC ), for every
(t0 , x0 ) I , the unique solution to the Cauchy problem (A.1) can be extended
to a maximal interval of the form I = (, ) with < t0 < and R {},
R {+}. Under additional assumptions, we can sometimes insure that any
solution can be extended to R.
Theorem A.4 Let f : RRm Rn be a function satisfying the assumptions (HCC )
1 (R, [0, +))
(with = Rn ) and such that there exist two functions K , M in L loc
such that
| f (t, x)| K (t)|x| + M(t) a.e. t R x Rn .
Then any solution of x = f (x(t)) can be extended to R.
Remark A.3 For sake of simplicity, we stated Theorem A.4 in the case of a nonautonomous function defined on R Rn . The same results holds for a function defined

124

Appendix A: Ordinary Differential Equations

on I Rn where I is an open interval in R. Namely, any solution to the Cauchy


problem can be extended to I .
Let I R be an interval and A L 1 (I ; Mn (R)) be a function from I into the
set of n n matrices denoted by Mn (R). By the above results, for every t0 I , the
Cauchy problem
= A(t)S(t),
S(t)

a.e. t I, S(t0 ) = In ,

has a unique solution which is defined on I . In the same way, the Cauchy problem
Y (t) = Y (t)A(t), a.e. t I, Y (t0 ) = In ,
admits a solution defined on I . Hence, the function Z : I Mn (R) defined as
Z (t) := Y (t)S(t) for every t I , satisfies for almost every t I ,

Z (t) = Y (t)S(t) + Y (t) S(t)


= Y (t)A(t)S(t) + Y (t)A(t)S(t) = 0.
Since Z (t0 ) = In , we deduce by uniqueness, that Z (t) = In for every t I . This
shows that the matrix S(t) is invertible for every t I .
1 (I ; Rn ), t I , and Rn . The solution to the
Proposition A.5 Let C L loc
0
0
Cauchy problem

(t) = A(t)(t) + C(t),


is given by


(t) = S(t)0 + S(t)

f or a.e. t I, (t0 ) = 0

(A.2)

S(s)1 C(s)ds, t I.

(A.3)

t0

We check easily that the function given by (A.2) satifies (A.3).

Appendix B

Elements of Differential Calculus

We recall here basic facts of first order calculus in normed vector spaces and less
basic facts of second order calculus. We refer the reader to textbook [2] for further
details on differential calculus in normed spaces. The results of second order calculus
are taken from the textbook [3].

B.1 First Order Calculus


Given two normed vector spaces (X,

X ) and (Y,

Y ), we denote by L (X, Y )
the space of continuous linear maps from X to Y . This space is equipped with the
operator norm (we denote alternatively by T u or T (u) the image of u by the
operator T )



T
= sup
T (u)
Y | u X,
u
X = 1 .
Let (X,

X ) and (Y,

Y ) be two normed vector spaces, U be an open subset
of X and let F : U X Y be a given mapping. Let u U . We say that F is
differentiable at u provided there is a continuous linear map Du F : X Y such that
for every > 0, there is > 0 such that

0 <
u u
X <

 



F(u) F u Du F u u

< .

u u

This property can also be written as

 



F(u) F u Du F u u

lim
= 0,

u u

uu
X
or




F(u) = F(u)
+ Du F u u +
u u
X o(1).

L. Rifford, Sub-Riemannian Geometry and Optimal Transport,


SpringerBriefs in Mathematics, DOI: 10.1007/978-3-319-04804-8,
The Author(s) 2014

125

126

Appendix B: Elements of Differential Calculus

The map F is said to be differentiable in U X if it is differentiable at every u U .


The map
DF : U L (X, Y )
u Du F
is called the derivative of F. If D F is a continuous map on U (where L (X, Y ) has
the norm topology) we say that F is of class C 1 on U . Finally we recall that given
a function F of class C 1 on an open set U X and a point u U , the derivative
Du F is called singular if it is not surjective and in that case u is called a critical
point. The Inverse Function Theorem allows to obtain a local openness at first order.
Theorem B.1 (Inverse Function Theorem) Let U be an open set of Rn , F : U
Rn be a function of class C 1 , and x U be such that Dx F is not singular. Then
there exists neighborhoods U U of x and V of F(x) such that F|U : U V is
a C 1 diffeomorphism.
One of its corollary, the Lagrange Multiplier Theorem, plays a major role in
Chap. 2.
Theorem B.2 (Lagranges Multipliers Theorem) Let (X,

X ) be a normed vector
space, U be an open subset of X , and E : U Rn and C : U R two mappings
of class C 1 on U . Assume that u U satisfies the following property:
 
 
C u C(u) for every u U such that E(u) = E u .
Then there exist 0 R and Rn with (0 , ) = (0, 0) such that
Du E = 0 Du C.
Proof Define the mapping : U X R Rn by
(u) := (C(u), E(u)) , u U .
The mapping is of class C 1 on U . We claim that u is necessarily a critical point
of , that is Du is singular. We argue by contradiction. If u is not a critical point,
the continuous linear map Du : X R Rn is surjective. Then there exists a
linear subspace Y of X of dimension n + 1 such that the restriction of Du to Y is
an isomorphism. Let y1 , . . . , yn+1 be a basis of Y and B be an open neighborhood
of 0 in Rn+1 such that
u +

n+1


i yi U



= 1 , . . . , n+1 B.

i=1

The mapping

: B Rn+1




n+1
= 1 , . . . , n+1 u + i=1
i yi

Appendix B: Elements of Differential Calculus

127

is of class C 1 on B with a derivative which is invertible at = 0. Hence, by the


Inverse Function Theorem, the point (u)
= (C(u),
E(u))
belongs to the interior

of the image of (B).


Thus for > 0 small enough, there is y Y with u + y U
such that

   
 
u + y = C u , E u ,
which contradicts. In consequence, u is a critical point of . Hence, there exists a
non-zero n + 1-tuple p = (0 , ) (with 0 R and Rn ) which is orthogonal
to the image of Du , that is such that
0 Du C + Du E = 0.


This concludes the proof.

B.2 Second Order Study


Let us denote by L 2 (X, Y ) the space of all continuous bilinear maps from X X
to Y . We can equip it with the operator norm



T
= sup
T (u 1 , u 2 )
Y | u 1 , u 2 X,
u 1
X =
u 2
X = 1 .
Given an open set U X and a mapping F : U X Y , we define


D 2 F := D D F : U X L 2 (X, Y )
if it exists (where we identify L (X, L (X, Y )) with L 2 (X, Y ). If D 2 F exists and
is continuous on U , we say that F is of class C 2 on U . In this case, the second
derivative Du2 F is symmetric at any point, that is
Du2 F (v, w) = Du2 F (w, v) v, w X, u U .
If F : U X Y is a function of class C 2 then we have for every u U the
second order Taylor formula
F(u + h) = F(u) + Du F(h) +

1 2
D F (h, h) +
h
2X o(1),
2 u

which means that

F(v) F(u) Du F (v u) 1 D 2 F (v u, v u)

2 u
Y
lim
= 0.
vu

v u
2X

128

Appendix B: Elements of Differential Calculus

By the Inverse Function Theorem, any function of class C 1 is locally open around any
point with an invertible derivative. We are going to provide a second-order sufficient
condition for local openness around critical points.
Let (X,

X ) be a normed vector space, N be a positive integer, U be an open
subset of X and F : U R N be a mapping of class C 2 on U . Given a critical
point u U , we call corank of u, the quantity
 

corank F (u) := N dim Im Du F .
We also recall that if Q : X R is a quadratic form (that is Q is defined by
Q(v) := B(v, v) with B : X X R a symmetric bilinear form), we define its
negative index by


ind (Q) := max dim(L) | Q |L\{0} < 0 ,
where Q |L\{0} < 0 means
Q(u) < 0 u L \ {0}.
The following result provides a sufficient condition for local openness around a
critical point at second order.
Theorem B.3 Let F : U R N be a mapping of class C 2 in an open set U X
and u U be a critical point of F of corank r . If


ind Du2 F

|Ker(Du F)

 

r Im Du F
\ {0},

(B.1)

then the mapping F is locally open at u,


that is the image of any neighborhood of u
is an neighborhood of F(u).



In the above statement, Du2 F |Ker(D F) refers to the quadratic mapping from

Ker(Du F) to R N defined by

Du2 F


|Ker(Du F)

(v) := Du2 F (v, v) v Ker(Du F).

The following result is a quantitative version of the previous theorem, it is useful in


Sect. 3.4. (We denote by B X (, ) the balls in X with respect to the norm

X .)
Theorem B.4 Let F : U R N be a mapping of class C 2 in an open set U X
and u U be a critical point of F of corank r . If (B.1) holds, then there exist
, c (0, 1) such that for every (0, ) the following property holds: For every
u U , z R N with

u u

X < , |z F(u)| < c 2 ,

Appendix B: Elements of Differential Calculus

129

there are w1 , w2 X such that u + w1 + w2 U ,




z = F u + w1 + w2 ,
and

w1 Ker (Du F) ,
w1
X < ,
w2
X < 2 .

The proof of Theorems B.2 and B.3 that we give in the next sections are taken
from the Agrachev-Sachkov textbook [3] and the Agrachev-Lee article [4].
We need two preliminary lemmas.
Lemma B.5 Let G : Rk Rl be a mapping of class C 2 with G(0) = 0. Assume
that there is
 


v Ker(D0 G) with D02 G v , v Im D0 G ,
such that the linear mapping
w Ker(D0 G) ProjK



D02 G v , w K

(B.2)

is surjective, where K := Im(D0 G) and ProjK : Rl K denotes the orthogonal projection onto K . Then there is a sequence {u i }i converging to 0 in Rk such
that G(u i ) = 0 and Du i G is surjective for any i.
 
Proof Let E avector space in Rk such that Rk = E Ker(D0 G). Since D02 G v , v
belongs to Im D0 G there is v E such that
 
 
1
D0 G v = D02 G v , v .
2
Define the family of mappings { }>0 : E Ker(D0 G) Rl by
(z, t) :=


1 2
3
4
5
G

t
+

z
(z, t) E Ker(D0 G), > 0.
5

For every > 0, is of class C 2 on E Ker(D0 G) Rl and its derivative at


(z, t) = (0, 0) is given by
D(0,0) (Z , T ) = D2 v +4 v G(Z ) +

1
D 2 4 G(T ),
2 v + v

for any (Z , T ) E Ker(D0 G). For every (Z , T ) E Ker(D0 G), the first term
of the right-hand side D2 v +4 v G(Z ) tends to D0 G(Z ) as tends to 0 and since

130

Appendix B: Elements of Differential Calculus



 2
  2
1
1
2
4
4 


o(1)
D
D
G(T
)
=
G(T
)
+
D
G

,
T
+
v

2
4
0
0
2 v + v
2




 
1
= 2 D02 G 2 v + 4 v , T + 2 v + 4 v  o(1) ,

the second term tends to D02 G(v, T ) as tends to 0. By (B.2), the linear mapping


(Z , T ) E Ker(D0 G) D0 G(Z ) + D02 G v , T Rl
is surjective. Then there is > 0 such that D0 is surjective for all (0, ).
Therefore for every (0, ) the set


(z, t) E Ker(D0 G) | (z, t) = 0
is a submanifold of class C 2 of dimension k l > 0 which contains the origin. Then
there is a sequence {(z i , ti )}i converging to the origin such that 1/i (z i , ti ) = 0 and
D(zi ,ti ) 1/i is surjective for all i large enough. Thus setting
u i :=

1
1
1
1
v + 3 ti + 4 v + 5 z i i,
2
i
i
i
i

we get G(u i ) = 0 and Du i G surjective for all i large enough. This proves the
lemma.


Lemma B.6 Let Q : Rk Rm be a quadratic mapping such that


 
ind Q m, Rm \ {0}.

(B.3)

Then the mapping Q has a regular zero, that is there is v Rk such that Q(v) = 0
and Dv Q is surjective.
Proof Since Q is a quadratic mapping, there is a symmetric bilinear map B : Rk
Rk Rm such that
Q(v) = B(v, v) v Rk .
The kernel of Q, denoted by Ker(Q) is the set of v Rk such that
B(v, w) = 0 w Rk .
It is a vector subpace of Rk . Up to considering the restriction of Q to a vector space
E satisfying E Ker(Q) = Rk , we may assume that Ker(Q) = 0. We now prove
the result by induction on m.
In the case m = 1, we need to prove that there is v Rk with Q(v) = 0 and
Dv Q = 0. By (B.3), we know that ind (Q) 1 and ind (Q) 1, which
means that there are two vector lines L + , L in Rk such that Q |L + \{0} < 0 and

Appendix B: Elements of Differential Calculus

131

Q |L \{0} > 0. Then the restriction of Q to L + L is a quadratic form which is


sign-indefinite. Such a form has regular zeros.
Let us now prove the statement of the lemma for a fixed m > 1 under the assumption that it has been proven for all values less than m. So we consider a quadratic
mapping Q : Rk Rm satisfying (B.3) and such that Ker(Q) = {0}. We distinguish
two cases:
First case: Q 1 (0) = {0}.
Take any v = 0 such that Q(v) = 0. If v is a regular point, then the statement of
the lemma follows. Thus we assume that v is a critical point of Q. Since Dv Q(w) =
2B(v, w) for all w Rk and Ker(Q) = {0}, the derivative Dv Q : Rk Rm cannot
be zero. Then its kernel E = Ker(Dv Q) has dimension k r with r := rank(Dv Q)
[1, m 1]. Set F := Im(Dv Q) and define the quadratic form
Q : E Rkr F Rmr
by




Q(w)
:= Proj F Q(w) w E,

where Proj F : Rm F denotes the orthogonal projection to F. We have for every


F and every w E,

= Q(w).
Q(w)
We claim that ind ( Q) m r , for every F \ {0}. As a matter of fact, by
assumption, for every F \ {0} there is a vector space L of dimension m such that
( Q)|L\{0} < 0. The space L E has dimension at least m k as the intersection
of L of dimension m and E of dimension k r in Rk . By induction, we infer that Q
Im(Dv Q) and
has a regular zero w E = Ker(Dv Q), that is Q(w)
 

w F
w E = Ker(Dv Q) Proj F B( w,
is surjective. Define F : Rk Rm by


F(u) := Q v + u u Rk .
The function F is of class C 2 verifies D0 F = Dv Q, D02 F = B and the assumptions
of Lemma B.5 are satisfied with v = w.
We deduce that Q has a regular zero as well.
Second case: Q 1 (0) = {0}.
In fact, we are going to prove that this case cannot appear. First we claim that Q is
surjective. Since Q is homogeneous (Q(r v) = r 2 Q(v) for all v Rk and r R),
we have


Q(Rk ) = r Q(v) | r 0, v Sk1 .

132

Appendix B: Elements of Differential Calculus

The set Q(Sk1 ) is compact, hence Q(Rk ) is closed. Assume that Q(Rk ) = Rm
and take x = Q(v) on the boundary of Q(Rk ). Then x is necessarily a critical point
for Q. Proceeding as in the first case, we infer that x = Q(w) for some non-critical
point. This gives a contradiction. Then we have Q(Rk ) = Rm . Consequently the
mapping
Q
: Sk1 Sm1
Q := |Q|
Q(v)
v |Q(v)|
is surjective. By Sards Theorem, it has a regular value x, that is x Sm1 such that
Dv Q is surjective for all v Sk1 satisfying Q(v) = x for all v Sk1 . Among the
set of v Sk1 such that Q(v) = x take v for which |Q(v)| is minimal, that is such
that
Q(v) = ax

and
a > 0, v Sk1 ,

Q(v) = ax = a a.

In other terms, if we define the smooth function : (0, +) Sk1 Rm as,


(a, v) := Q(v) ax, a > 0, v Sk1 ,
then the pair (a,
v ) satisfies
a a for every (a, v) (0, +) Sk1 with (a, v) = 0.
By the Lagranges Multipliers Theorem (Theorem B.2), there is 0 R and Rm
with (0 , ) = (0, 0) such that
Dv Q = 0 and

x = 0 .

Note that we have for every h Tv Sk1 Rk , we have


 
1
Dv Q(h) =    Dv Q(h) + [Dv |Q|(h)] Q v
 Q v 
=

(B.4)

1
Dv Q(h) + a [Dv |Q|(h)] x.
a

Consequently, if 0 = 0 (that is if (a,


v ) is a critical point of ), then Dv Q = 0
which contradicts the fact Dv Q is surjective (because cannot be collinear with x
by 2-homogeneity of Q). In conclusion, we can assume without loss of generality
v ) is not a critical point of , the set
that 0 = 1. Since (a,


C = (a, v) (0, +) Sk1 | (a, v) = 0

Appendix B: Elements of Differential Calculus

133

k1 of dimension k m in a neighborhood
is a smooth
 submanifold of (0, +) S
of a,
v . Then for every (h a , h v ) Ker(Da,
v ), which is equivalent to h a = 0 and
Dv Q(h v ) = 0 with h v Tv Sk1 , there is a smooth curve = (a , v ) : (, )
C such that (0) = (a,
v ) and (0) = (h a , h v ). Then differentiating two times the
2
equality ( (t)) = 0 and using that a2 = 0 and
v ) = Dv Q = 0, we
v (a,
get
2  
 
a,
v = (0) x = (0).
2 a,
v = (0)
v
a

Note that v2 = Q. Furthermore, since (a,


v ) is solution to our minimization problem
with constraine, we have a (t) a = a (0) for all t (, ). Then we have
2



Q(h) 0 h Ker Dv Q Tv Sk1 .
Since Q(v) = a > 0 we have indeed



Q(h) 0 h Ker Dv Q Tv Sk1 Rv =: L .
Let us compute the dimension of the non-negative subspace L of the quadratic form
Q. Since Dv Q is surjective, we have

 
dim Im Dv Q = m 1.


Which means (remember (B.4)) that Im Dv Q |Sk1 has dimension m or m 1. But
Dv Q = 0 with = 0, thus we have necessarily

 
dim Im Dv Q |Sk1 = m 1
and


 


dim Ker Dv Q Tv Sk1 = dim Ker Dv Q |Sk1 = k 1 (m 1)
= k m.
Consequently, dim(L) = k m + 1, thus ind ( Q) has to be m 1, which
contradicts the hypothesis of the lemma. This shows that Q 1 (0) = {0} is impossible
and concludes the proof of the lemma.


We are ready to prove Theorem B.3. Set




 

S := Im Du F
| || = 1 R N .

134

Appendix B: Elements of Differential Calculus

By assumption (B.1), for every S, there is a subspace E Ker (Du F) of


dimension r such that


Du2 F |E \{0} < 0.



By continuity of the mapping Du2 F |E , there is an open set O S such

that


Du2 F |E \{0} < 0 O .

Choose a finite covering


S=

I


Oi

i=1

and a finite dimensional space E X such that






Im Du F|E = Im Du F .
I
E i X
Then the restriction F of F to the finite dimensional subspace E + i=1
satisfies





r Im Du F
\ {0},
ind Du2 F

|Ker(Du F)

with


 
 

r = corank F u := N dim Im Du F = N dim Im Du F .


K
and define the quadratic mapping Q : Ker(Du F)
Set K := Im Du F
by





Du2 F (v, v) v Ker Du F ,
Q(v) := ProjK
where ProjK : R N K denotes the orthogonal projection onto K . The assumption (B.3) of Lemma B.6 is satisfied. Then by Lemmas B.6, Q has a regular zero,
such that
that is v Ker(Du F)
 
Q v = 0



 
Du2 F v , v K = Im Du F

and
Dv Q surjective






w Ker Du F ProjK Du2 F v , w K surjective.

Appendix B: Elements of Differential Calculus

135

u + v) F(
u)
Setting G(v) := F(
and applying Lemma B.5, we get a sequence
and Du i F is surjective for any i. By
{u i }i converging to u such that F(u i ) = F(u)
the Inverse Function Theorem, this implies that F is locally open at u.

Proceeding as in the proof of Theorem B.3, we may assume that X is finite


dimensional. We may also assume that u = 0 and F(u)
= 0. As before, set K :=
(Im (Du F)) and define the quadratic mapping Q : Ker(D0 F) K by
Q(v) := ProjK



D02 F (v, v) v Ker (D0 F) ,

where ProjK : R N K denotes the orthogonal projection onto K . By (B.1) and


Lemmas B.6, Q has a regular zero v Ker(D0 F). Let E be a vector space in Rk
such that X = E Ker(D0 F). Define G : E Ker(D0 F) R N by
G(z, t) := D0 F(z) +

1 2
D0 F (t, t) (z, t) E Ker(D0 F).
2

Then assumptions of Lemma B.5 are satisfied and there is a sequence {(z i , ti )}i
converging to 0 such that G(z i , ti ) = 0 and D(zi ,ti ) G is surjective for all i.
Lemma B.7 There are , c > 0 such that the image of any continuous mapping
G : B(0, 1) R N with





G(u) | u = (z, t) B X (0, 1)
sup G(u)
(B.5)

contains the ball B(0,


c).
Proof This is a consequence of the Brouwer Theorem which asserts that any con
tinuous mapping from B(0,
1) Rn into itself has a fixed point, see [5]. Let i large
enough such that u i := (ti , z i ) belongs to B(0, 1/4). Since Du i G is surjective, there
is a affine space V of dimension N which contains u i and such that Du i G |V is invertible. Then by the Inverse Function Theorem, there is a open ball B = B X (u i , ) V
of u i in V such that the mapping
G |V : B G |V (B) R N
is a smooth diffeomophism. We denote by G : G |V (B) B its inverse. The set

c). Taking c > 0 sufficiently small we may


G |V (B) contains some closed ball B(0,
assume that



c).
G (y) B X u i , /4 y B(0,
There is > 0 such that any continuous mapping G : B X (0, 1) R N verifying
(B.5) satisfies

G(u)
G |V (B) u B X (u i , /2) V

136

Appendix B: Elements of Differential Calculus






u B X (u i , /2) V.
 G G (u) u 
4

and

Let G : B X (0, 1) R N be a continuous mapping verifying (B.5) and y B(0,


c)
be fixed. By the above construction, the function
: B X (G (y), /4) B X (G (y), /4)
defined by


(u) := u G G (u) + G (y) u B X (G (y), /4),
is continuous from B X (G (y), /4) into itself. Thus by Brouwers Theorem, it has a
fixed point, that is there is u B X (G (y), /4) such that
(u) = u

G(u)
= y.

This concludes the proof of the lemma.

Define the family of mappings { }>0 : E Ker(D0 F) R N by


(z, t) :=


1 2
F

z
+
t
(z, t) E Ker(D0 F), > 0.
2

By Taylors formula at second order for F at 0, we have


(z, t) = G(z, t) + o(1),
as tends to 0. Then there is > 0 (with |(2 , )| 1/2) such that for every
(0, ),



(z, t) E Ker(D0 F) B(0, 1).


2



By Lemma B.7 applied to G = , we infer that B(0,


c) is contained in B(0, 1) ,
which in turn implies that for every z R N such that |z| = |z F(u)|
< c2 , there
are w1 , w2 in X such that
| (z, t) G(z, t)|

z = w1 + w2 , w1 Ker(Du F),
w1
X < ,
w2
X < 2 .
Let us now show that the above result holds uniformly for u close to u = 0. Since F
Moreover, again
is C 1 , the vector space Ker(Du F) is transverse to E for u close to u.
),
by C 1 regularity, for every > 0, there is > 0 such that for every u B X (u,


Ker(Du F) B(0, 1) y + z X | y Ker(Du F) B(0, 1),
z
X < .

Appendix B: Elements of Differential Calculus

137

Therefore, there is > 0, such that for every u B X (u,


), there is a vector space
Wu X such that (Wu could be reduced to {0})


X = E Wu Ker Du F ,
and there are linear mappings


1 : Ker(D0 F) Wu , 2 : Ker(D0 F) Ker Du F
such that for every t Ker(D0 F), we have




t = 1 (t) + 2 (t), 1 (t) X K |t|, 1 (t) X K |t|,
for some constant K > 0 (which depends on Ker(D0 F),E, and

X ). Given
) and (0, ) we define G : E Ker(D0 F) B(0, 1) R N by
u B X (u,


1 

G(z,
t) := 2 F u + 2 z + 2 1 (t) + 2 (t) F(u) ,



for every (z, t) E Ker(D0 F) B(0, 1). Taking and > 0 smaller if necessary,
by Taylors formula for F at u at second order, by the above construction and by the
fact that Du F and Du2 are respectively close to D0 F and D02 F, we may assume that
(B.5) is satisfied. We conclude easily.


References
1. Hirsch, M.W., Smale, S.: Differential Equations, Dynamical Systems, and Linear Algebra. Pure
and Applied Mathematics, vol. 60. Academic, New York (1974)
2. Abraham, R., Marsden, J.E., Ratiu, T.: Manifolds, tensor analysis, and applications. Global
Analysis Pure and Applied: Series B, 2. Addison-Wesley Publishing Co., Reading (1983)
3. Agrachev, A.A., Sachkov, Y.L.: Control Theory from the Geometric Viewpoint. Encyclopaedia
of Mathematical Sciences, vol. 87, Springer, Heidelberg (2004)
4. Agrachev, A., Lee, P.: Optimal transportation under nonholonomic constraints. Trans. Amer.
Math. Soc. 361(11), 60196047 (2009)
5. Bredon, G.E.: Topology and Geometry. Graduate Texts in Mathematics, vol. 139. Springer, New
York (1993)

Index

Symbols
c-convex, 84
c-cyclically monotone, 82
c-subdifferential, 86
c-transform, 84

C
Concatenation
of controls, 16
of paths, 23
Control, 13
regular, 20
singular, 20
strictly abnormal, 68
Critical point, 60, 126

D
Degree of nonholonomy, 10
Distribution, 1
codimension-one nonholonomic, 113
contact (and contact form), 11, 25
fat, 24, 112
Martinet, 11, 25
medium-fat, 69, 113
totally nonholonomic, 9
trivial, 1
two-generating, 112

E
End-Point mapping, 13
Extremal
abnormal, 21
normal, 47, 52

F
Frame
global, 2, 12
local, 1
orthonormal, 34
Function
absolutely continuous, 121
absolutely continuous with
integrable derivative, 121
cost, 77
Lipschitz in charts, 58, 94
locally semiconcave, 97

square

G
Generating family, 4
orthonormal, 34
Goh condition, 68
path, 68
Gronwalls Lemma, 122
H
Hrmander condition, 9
Heisenberg group, 70
Horizontal path, 12
length (of a), 34
minimizing, 37
regular, 23
singular, 23
K
Kantorovitch potential, 85
L
Lie

L. Rifford, Sub-Riemannian Geometry and Optimal Transport,


SpringerBriefs in Mathematics, DOI: 10.1007/978-3-319-04804-8,
The Author(s) 2014

139

140
Lie (cont.)
algebra, 8
bracket, 5, 6

M
Martinet
set, 112
surface, 25, 27
Method of characteristics, 49
Minimizing geodesic, 38
singular, 46
strictly abnormal, 68

N
Negative index (of a quadratic form), 60, 128

O
Optimal transport
plan, 82
Optimal transport problem
Kantorovitch, 80
Monge, 77

R
Rank
of a control, 16
of an horizontal path, 23

S
Set
contact, 86

Index
moving, 104
singular, 112
static, 104
Singular, 20
Sub-differential, 92
Sub-Riemannian
ball, 34
distance, 34
energy, 37
exponential map, 52
Hamiltonian, 47, 52
norm, 34
Sub-Riemannian structures, 33
complete, 43
ideal, 97
Lipschitz, 100
Sub-TWIST condition, 93
super-differential, 92
Support (of a measure), 81

T
Theorem
Brenier, 94
Cauchy-Carathodorys, 123
Cauchy-Peano, 122
Chow-Rashevsky, 32
Inverse Function, 126
Lagranges Multipliers, 126
McCann, 95
SR Hopf-Rinow, 40
Trajectory, 13
Transport
map, 77
plan, 80