You are on page 1of 6

2017 IEEE 56th Annual Conference on Decision and Control (CDC)

December 12-15, 2017, Melbourne, Australia

A Near-Optimal Decoupling Principle for Nonlinear Stochastic Systems


Arising in Robotic Path Planning and Control∗
Mohammadhussein Rafieisakhaei1 , Suman Chakravorty2 and P. R. Kumar1

Abstract— We consider nonlinear stochastic systems that design approaches. Another popular approach utilizes Dif-
arise in path planning and control of mobile robots. As is ferential Dynamic Programing (DDP) [11] and DDP-based
typical of almost all nonlinear stochastic systems, optimally variations such as the Stochastic DDP [12], iLQR and iLQG
solving the problem is intractable. Moreover, even if obtained
it would require centralized control, while the path planning [13]–Stochastic DDP relies on second order approximation
problem for mobile robots requires decentralized solutions. We of the dynamics and cost, whereas iLQR and iLQG uses
provide a design approach which yields a tractable design that second order approximation of the cost but first order lin-
is quantifiably near-optimal. We exhibit a decoupling result earization of the dynamics. These methods propose iterative
under a small noise assumption consisting of the optimal open- methods that attempt to find “locally-optimal” solutions in a
loop design of nominal trajectory followed by decentralized
feedback law to track this trajectory. As a corollary, we obtain tube around a nominal trajectory [13] by coupling the design
a trajectory-optimized linear quadratic regulator design for of feedback policy and the nominal trajectory of the system.
stochastic nonlinear systems with Gaussian noise.

I. I NTRODUCTION In this paper, we address the nonlinear stochastic control


problem and propose an architecture under which the de-
Many robotic systems, in particular, mobile aerial and coupled design of an optimal open-loop control sequence
ground robots, are equipped with noisy actuators that require and a decentralized feedback policy is both tractable and
feedback compensation or planning ahead in a policy that near-optimal. In particular, we show that under a small noise
accounts for the random perturbations. Simply ignoring the assumption, the decoupling into globally-optimal trajectory
noise and planning for the unperturbed equivalent of the design and a decentralized feedback control law holds for
stochastic system can yield crucial errors leading to failure fully-observed nonlinear stochastic systems of the type of
in reaching the end-goal, or result in the system falling into interest in mobile robotic systems.
unsafe states. Moreover, the solution should not require a
fully centralized control since that would require pervasive
constant communication among all robots. The design can be broken into two parts: i) an open-loop
In a stochastic setting, the general problem of sequential optimal control problem that designs the nominal trajectory
decision-making can be formulated as a Markov Decision of the LQR controller, which respects the nonlinearities
Problem (MDP) [1], [2]. The optimal solution of the stochas- as well as state and control constraints; ii) the design of
tic control problem can be obtained iteratively by value a decentralized LQR policy around the optimized nominal
or policy iteration methods to solve the Hamilton-Jacobi- trajectory. The quality of the design is rigorously provided
Bellman equations [2]. Except in special cases, such as in by the main results of the paper. We quantify the first order
a linear Gaussian environment, this involves discretization stochastic error for small-noise levels based on large devi-
of the underlying spaces [3]; an approach whose scalability ations theory. We thereby arrive at a Trajectory-optimized
faces the curse of dimensionality [4]. As a result, the solu- decoupled Linear Quadratic Regulator (T-LQR) design for
tions require a computation time that is provably exponential fully-observed nonlinear stochastic systems under Gaussian
in the state dimension, in a real number based model of small-noise perturbations.
complexity, without any assumption that P = N P [5].
Many approaches have been proposed based on their The organization of the paper is as follows. Section II
tractability. Model Predictive Control (MPC)-based methods states a simple large deviations result for a linear Gaussian
[6], [7], robust formulations [8], [9], and other designs that system. Section III defines a general stochastic control prob-
relate to the Pontryagin’s Maximum Principle [10] are some lem for a fully-observed system. Section IV provides the
of the methods that have been successfully used as surrogate main results by first analyzing the effect of feedback com-
pensation on the linearization error, and then providing the
*This material is based upon work partially supported by NSF under state and control error propagations along with probabilistic
Contract Nos. CNS-1646449 and Science & Technology Center Grant CCF-
0939370, the U.S. Army Research Office under Contract No. W911NF- bounds based on the large deviations results developed in
15-1-0279, and NPRP grant NPRP 8-1531-2-651 from the Qatar National Section II. It also describes the T-LQR approach and proves
Research Fund, a member of Qatar Foundation. its near-optimality. Finally, Section V provides a design
1 M. Rafieisakhaei and P. R. Kumar are with the Department of Electrical

and Computer Engineering, and 2 S. Chakravorty is with the Department based on T-LQR for a non-holonomic car-like robot and
of Aerospace Engineering, Texas A&M University, College Station, Texas, provides numerical results illustrating the proposed approach
77840 USA. {mrafieis, schakrav, prk@tamu.edu} to design.

978-1-5090-2873-3/17/$31.00 ©2017 IEEE 1


II. S MALL R ANDOM P ERTURBATIONS OF A L INEAR 
K
⇒ P ( max ||xt || > δ) ≤ P (||xt || > δ)
S YSTEM 1≤t≤K
t=1

In this section, we consider the small noise perturbations 


K 
K 
nx

of a linear Gaussian system. We state a simple Large De- = P (||xt ||2 > δ 2 ) = P( (xit )2 > δ 2 )
t=1 t=1 i=1
viations probability for a linear Gaussian system. A general
discussion regarding large deviations of the trajectories of a 
K 
nx K  nx
 δ2
i 2 2
≤ P ((xt ) > δ ) ≤ β̄ exp(−γ̄ 2 )
perturbed system from that of its unperturbed counterparts δ 
t=1 i=1 t=1 i=1
and related theories can be found in [14]–[22].
 δ2
= Knx β̄ exp(−γ̄ 2 ).
Lemma 1. Large Deviations for Linear Gaussian System: δ 
Let

xt+1 = At xt + σ t wt , wt ∼ N (0, Σw ), (1) Thus, the probability that the trajectory of x ever exits
the tube of radius δ around the nominal zero trajectory in
where xt ∈ X ⊂ Rnx , xt = 0,  > 0, and Σw , σ t  0. the time interval [0, t] goes to zero exponentially. We will
Then, for each δ > 0, and for some β̄ > 0 and γ̄ > 0, use this to analyze the optimality of our design in the next
 δ2 section.
P ( max ||xt || > δ) = Knx β̄ exp(−γ̄ 2 ). (2)
1≤t≤K δ  III. T HE F ULLY-O BSERVED S YSTEM
Proof. First note, The general stochastic control problem of interest for a
fully-observed system can be formulated as an optimization

t−1 
t−1
xt =  (Πt−1 problem in the space of feedback policies. Without loss of
r=s+1 Ar )σ s ws =:  Φs,t ws ,
s=0 s=0
generality, we consider discrete-time systems.
Process model: We denote the state and control by x ∈
where Φs,t := (Πt−1
r=s+1 Ar )σ s , 0 ≤ s ≤ t − 1, 2 ≤ t ≤ K, X ⊂ Rnx and u ∈ U ⊂ Rnu , respectively. Given x0 ∈ X,
and Φ0,1 = σ 0 . Now, if xt = (xit ), wt = (wti ), 1 ≤ i ≤ nx the process model with f : X × U → X is defined as:
and Φs,t = (Φij
s,t ), 1 ≤ i, j ≤ nx , then
xt+1 = f (xt , ut ) + σ t wt , wt ∼ N (0, Σwt ) (3)

t−1 
nx
xit =  Φij 2 where {wt } is independent, identically distributed (i.i.d.).
s,t ws ∼ N (0,  αi,t ),
i

s=0 j=1 Now, we pose the general stochastic control problem [1],
[24].
t−1 nx 2
where αi,t := s=0 j=1 (Φij
s,t ) , 1 ≤ i ≤ nx , 1 ≤ t ≤ K, Problem 1. Stochastic Control Problem for Fully-
whence αi,t > 0. Now, let z ∼ N (0, 1) be the standard
Observed System: Given an initial state x0 , we wish to
normal random variable. Then, for 0 < δ ≤ u, we have
determine an optimal or near-optimal policy for
1 ≤ u/δ, and the tail probability of z is [23]:
 ∞ 
K−1
1 u2 min E[ cπ π
t (xt , ut ) + cK (xK )]
P (z > δ) = √ exp(− )du π
2π δ 2 t=0
 ∞ s.t. xt+1 = f (xt , ut ) + σ t wt ,
1 u u2 1 δ2 (4)
≤√ exp(− )du ≤ √ exp(− ),
2π δ δ 2 δ 2π 2
where the optimization is over continuously differentiable
Hence, we have Markov, i.e., time-varying state-feedback policies, π ∈ Π,
and
2 δ2 π := {π 0 , · · · , π t }, π t : X → U, and ut = π t (xt )
P (z 2 > δ 2 ) = P (z > δ) + P (z < −δ) ≤ √ exp(− ), •
δ 2π 2 specifies the action taken given the state;
i 2
x
⇒ P ((xit )2 > δ 2 ) = P (( t )2 > 2 2 )
δ • cπ
t (·, ·) : X × U → R is the one-step cost function; and
αi,t  αi,t • cπ
K (·) : X → R denotes the terminal cost;

αi,t 2 δ2 • K > 0 is the time horizon.
≤ exp(− 2 2 ).
δ π 2 αi,t
IV. T-LQR: T RAJECTORY- OPTIMIZED LQR

2 In this section, we provide the theoretical basis for our
Now, let β̄ := π (max1≤i≤nx ,1≤t≤K αi,t ) and γ̄ :=
design. The analysis employs the Taylor series expansion
1/(β̄ 2 π), whence β̄, γ̄ > 0. Then,
of the process model and large deviations theory. We also
 δ2 present the resulting T-LQR design approach and prove its
P ((xit )2 > δ 2 ) ≤ β̄ exp(−γ̄ 2 ), near-optimality.
δ 

2
A. Preliminaries and the l-system becomes:
Problem 2. Deterministic Open-Loop Problem: Given an
x̃lt+1 = At x̃lt − Bt Lt x̃lt + Gt wt . (12)
initial state x0 , we begin by determining an optimal open-
loop sequence for The difference d-system: We denote the difference between

K−1 the two systems of (11) and (12) by a superscript d, and
min ct (xt , ut ) + cK (xK ) define for 0 ≤ t ≤ K − 1:
u0:K−1
t=0
s.t. xt+1 = f (xt , ut ). (5) ũdt := ũt − ũlt , ũdt = −Lt (x̃t − x̃lt ), (13a)
x̃dt+1 := x̃t+1 − x̃lt+1 , x̃dt+1 = At x̃dt +Bt ũdt +o(||x̃||∞ ),(13b)
Nominal trajectories: For 0 ≤ t ≤ K − 1, let upt be
the optimal open-loop solution of the deterministic problem where ũd0 = ũ0 − ũl0 = 0, and x̃d0 = x̃0 − x̃l0 = 0. Thus,
above, and let xpt be the corresponding state, where
ũdt = −Lt x̃dt ,
xpt+1 := f (xpt , upt ), (6)
x̃dt+1 = (At − Bt Lt )x̃dt + o(||x̃||∞ ) = Dt x̃dt + o(||x̃||∞ )
where xp0 := x0 . We refer to this as the nominal trajectories.
Linearization of the system equations: We consider the = D̃0:t x̃d0 + o(||x̃||∞ ) = o(||x̃||∞ ),
application of a control ut = upt +ũt to the stochastic system. where Dt := At − Bt Lt , 0 ≤ t ≤ K − 1, D0 := A0 , and
Denote the resulting trajectory by xt = xpt + x̃t , where x̃t := D̃t1 :t2 = Πtt=t
2
Dt , t2 ≥ t1 ≥ 0, otherwise, it is the identity
xt − xpt denotes the state error. Then, 1
matrix. This leads to ũdt = o(||x̃||∞ ). Hence,
xpt+1 + x̃t+1 = f (xpt + x̃t , upt + ũt ) + σ t wt . (7)
O(||x̃l ||∞ ) = O(||x̃||∞ ) + o(||x̃||∞ ) = O(||x̃||∞ ), (14)
Next, we linearize the drift of the process model around its O(||ũl ||∞ ) = O(||x̃l ||∞ ) = O(||x̃||∞ ), (15)
nominal counterparts. Then for 0 ≤ t ≤ K − 1:
O(||ũ||∞ ) = O(||x̃||∞ ). (16)
x̃t+1 = At x̃t + Bt ũt + Gt wt + o(||x̃t || + ||ũt ||) (8a)
This means that all the errors in the original system, the
= At x̃t + Bt ũt + Gt wt + o(||x̃||∞ + ||ũ||∞ ), (8b)
l-system, and the d-system are of the order of O(||x̃||∞ ).
as (||x̃||∞ + ||ũ||∞ ) ↓ 0, where we have: Moreover, O(||x̃||∞ ) is itself O(||x̃l ||∞ ), which we calculate
• At := ∇x f (x, u)|xp ,up , Bt := ∇u f (x, u)|xp ,up , Gt := σ t ; next.
t t t t
p
• ũ0 = u0 − u0 = 0, and x̃0 = x0 − x0 = 0.
p Large deviations: The l-system is a linear Gaussian system
The exactly linear l-system: From the above system of with additive noise, for which we use large deviations
(8), we remove the o(·) terms, and define an exactly linear Lemma of 1 modifying the definition of Φs,t for 0 ≤ s ≤
system: t − 1, 2 ≤ t ≤ K as Φs,t := (Πt−1 r=s+1 Dr )σ s , 0 ≤ s ≤
t − 1, 2 ≤ t ≤ K. Thus, for each finite δ ≥ 0, we have
x̃lt+1 := At x̃lt + Bt ũlt + Gt wt , (9a) P {max0≤t≤K ||x̃lt || ≥ δ} = o(). Let Ω() be the set where
max0≤t≤K ||x̃lt || ≤ δ. Then, P (Ω()) ≥ 1 − o() and for
where x̃l0 := x̃0 = 0.
ω ∈ Ω(), ||x̃l ||∞ = O(δ). Therefore, from the calculations
LQR policy: Now we consider the design of an LQG
above, we have that O(||x̃||∞ ) = O(δ), and hence all the
policy for the l-system with the cost:
other errors are also O(δ) for for ω ∈ Ω(). Then for

K−1
ω ∈ Ω() and for all 0 ≤ t ≤ K − 1,
min E[ (x̃lt )T Wtx x̃lt + (ũlt )T Wtu ũlt ], (10)
π
t=0 ut = upt + ũlt + O(δ), (17a)
where Wtu , Wtx  0 are positive-definite matrices. This xt+1 = xpt+1 + x̃lt+1 + O(δ), (17b)
problem results in a policy ũlt = −Lt x̃lt , where the linear
feedback gain Lt for K − 1 ≥ t ≥ 0 can be obtained by: which means that the linear Gaussian stochastic (·) ˜ l -system
Lt = (Wtu + BTt Pft+1 Bt )−1 BTt Pft+1 At , along with the deterministic p-system can be used to control
and estimate the original system given the O(δ) approxima-
and the matrix Pft is the result of backward iteration of the tions hold (with probability of at least 1 − o()). In another
dynamic Riccati equation interpretation, the original system can be approximated for
all 0 ≤ t ≤ K − 1 as:
Pft = (At )T Pft+1 At −(At )T Pft+1 Bt Lt + Wtx ,
ut = ult + O(δ), (18a)
which is solvable with a terminal condition PfK = Wtx .
Now, since x̃lt is fictitious, we use ũt = −Lt x̃t in the xt+1 = xlt+1 + O(δ). (18b)
original system. Then (8) can be rewritten as:
B. Main Results
x̃t+1 = At x̃t − Bt Lt x̃t + Gt wt + o(||x̃||∞ +||ũ||∞ ), (11a)
In this section, we quantify the performance obtained from
= At x̃t − Bt Lt x̃t + Gt wt + o(||x̃||∞ ), (11b) the above design. The proofs are provided in the appendix.

3
Lemma 2. State Error Propagation: For the l-system of Hence, J − J1 = O(δ) for ω ∈ Ω().
(12), the state error x̃lt+1 can be written as: Next, we provide the main result regarding the expected
first order error of the cost function.

t
x̃lt+1 = D̃w
s,t ws , 0 ≤ t ≤ K − 1, (19) Theorem 1. First Order Cost Function Error: Given
s=0 that process noises are zero mean i.i.d. Gaussian, under a
where we have: first-order approximation for the small noise paradigm, the
w
• D̃s,t := D̃s+1:t Gs , 0 ≤ s ≤ t − 1, t ≥ 1; and stochastic cost function is dominated by the nominal part of
w
• D̃t,t := D̃t+1:t Gt = Gt , t ≥ 0.
the cost function. Moreover the expected first-order error is
O(δ). That is,
Proof. Given x̃l0 = 0, we have:
E[J˜1 ] = O(δ), and E[J] = J p + O(δ).
x̃lt+1=At x̃lt + Bt ũlt + Gt wt = (At − Bt Lt )x̃lt + Gt wt
K−1
t  t Proof. Let J˜1l := t=0 (Cxt x̃lt+Cu l x l
t ũt ) + CK x̃K . Also note
=: Dt x̃lt +wt =:D̃0:t x̃l0 + D̃r+1:t Gr wr =: D̃w
s,t ws . x̃0 = 0, and E[wt ] = 0 for all t. Then, we use Lemmas 2
r=0 s=0 and 3:

K−1

The following lemma follows directly by taking into E[J˜1l ] = (Cxt E[x̃lt ] + Cu x
t E[ũt ])+CK E[x̃K ]
l l

t=0
account the feedback law in the result of Lemma 2.
K 
t−1 
K−1 
t−1
Lemma 3. Control Error Propagation: For the l-system = Cxt E[ D̃w w
s,t−1 s ]+ C u
t E[− Lw
s,t ws ]
of (12), the control error ũlt can be written as t=0 s=0 t=0 s=0
K 
t−1 
K−1 t−1

t−1
= Cxt D̃w Cu w
ũlt = − Lw s,t−1 E[ws ]− t Ls,t E[ws ] = 0.
s,t ws , 1 ≤ t ≤ K − 1,
t=0 s=0 t=0 s=0
s=0
Now, we take expectation from both sides of (21b). Since,
where Lw w
s,t := Lt D̃s,t−1 , t ≥ 1, t − 1 ≥ s ≥ 0.
for ω ∈
/ Ω(), J ≤ M , then
Proof. Note that ũl0 = 0. Now, using Lemma 2, we have:
 w
t−1  w
t−1 E[J] = J p + (1 − o())(E[J˜1l ] + O(δ)) + M o()
ũlt = −Lt x̃lt = −Lt D̃s,t−1 ws =: − Ls,t ws .
s=0 s=0 = J p + O(δ) + o().
Next, we linearize of the cost function and provide the Now, by fixing δ > 0 and choosing  > 0 small enough,
decoupling principle for a fully-observed system. then, we have: E[J] = J p + O(δ).
Linearization of the cost function: We similarly linearize
the cost function around the nominal trajectories of state and Hence, the expected stochastic cost is equal to the nominal
control actions: cost with a very high probability as  ↓ 0. Therefore, it

K−1 follows that the open-loop nominal design can be done
J = J p + J˜1 + o( (||x̃t || + ||ũt ||) + ||x̃K ||) (20a) decoupled from the closed-loop design, summarized below:
t=1
Corollary 1. Decoupling Principle: Decoupling of the
= J p + J˜1 + o(||x̃||∞ ), (20b)
Open-Loop and Closed-Loop Designs Under Small Noise.
where we assume that the cost function is continuously Based on Theorem 1, under the small noise paradigm, as
differentiable and bounded. That is |ct | ≤ M and |cK | ≤ M  ↓ 0, the design of the feedback law can be conducted
for some M > 0. Moreover: decoupled from the design of the open-loop optimized tra-
K−1 p p p jectory. Furthermore, this result holds with a probability that
• J := t=0 ct (xt , ut )+cK (xK ) denotes the nominal cost;
p
 exponentially tends to one as  ↓ 0.
• J˜1:= t=0 (Ct x̃t +Ct ũt )+CK x̃K is the first order cost
K−1 x u x

error;
Proof. Using Theorem 1, for ω ∈ Ω() we have E[J] =
• J1 := J + J˜1 is the first order approximation of the cost;
p
x u
J p + O(δ), which is the cost of applying policy π(xt ) =
• and Ct = ∇x ct (x, u)|xp ,up , Ct = ∇u ct (x, u)|xp ,up ,
x
t t t t upt − Lt (xt − xpt ) to the stochastic system. Now, suppose
CK = ∇x cK (x)|xpK . π ∗ is the optimal stochastic policy. By assumption π ∗
Therefore, for ω ∈ Ω(), and is continuously differentiable. Therefore, by modifying the

K−1 definition of Lt as Lt = ∇x π ∗t (x)|x∗p t
, defining u∗p
t =
∗p
J = Jp + (Cxt x̃t + Cu x
t ũt ) + CK x̃K + O(δ) (21a) π ∗ (xt ) and replacing p with ∗p in (6), we have π ∗t (xt ) =
t=0 u∗p ∗p
t − Lt (xt − xt ) + o(||x̃t ||). Similarly, we modify ũt =
d


K−1 −Lt x̃dt + o(||x̃t ||) and use appropriate modifications, whence
= Jp + (Cxt x̃lt + Cu x l
t ũt ) + CK x̃K + O(δ). (21b)
l
the entire calculations of the previous sections hold for this
t=0 policy. Hence, using Theorem 1 for this system, the cost

4
function of policy π ∗ can be written as E[Jπ∗ ] = J ∗p +O(δ).
Now, by construction J p ≤ J ∗p , and
E[Jπ∗ ] = J ∗p + O(δ) ≥ J p + O(δ) = E[Jπ ] + O(δ)
As a result, policy π is within O(δ) of the optimal stochastic
policy.
C. Discussion
Remark: This result means that under a small noise
assumption open-loop nominal trajectory of the system can (a) Optimized trajectory of (b) A typical ground truth trajectory
be designed by replacing the stochastic equations with their problem 2. with noise standard deviation equal to
nominal counterparts. Then, a decentralized feedback control 10% of the maximum control signal.
law can be designed using the LQG theory. This design is Fig. 1. Optimized vs. a typical execution trajectory for a car-like robot.
near-optimal as the intensity of noise tends to zero. We show
in the example below that this design procedure can be used
even for moderate levels of noise.
Remark: In Ref. [17], for a special case of nonlinear
systems where the process model is linear in the control
variable, i.e., f (xt , ut ) = f1 (xt ) + f2 (xt )ut , three results
are proven. The first result, concerns the -optimality of the
optimal deterministic law under convexity of J in the control
(i.e., vT (∇u,u J)v  0 , ∀v), and additional smoothness
and regularity conditions. The second result concerns the 2 -
optimality of the optimal deterministic law under a stronger (a) Feedback-compensated system. (b) Open-loop system.
convexity condition of J in the control (i.e., vT (∇u,u J)v 
Fig. 2. Evolution of average NMSE as  ↓ 0 for a feedback compensated
c(||u||)||v||2 , ∀v, c(·) : R → R is a monotonically non- and open-loop system with the same nominal trajectories.
increasing positive function), and some smoothness and regu-
larity conditions. The third result concerns the -optimality of
the optimal deterministic sequence under the latter condition. optimal control law through intractable dynamic program-
Our result, on the other hand, provides the -optimality of the ming. In contrast, the proposed design approach utilizes the
proposed design approach for a broader class of processes more tractable solution via Maximum Principle, followed
f (xt , ut ) with nonlinear dependence in the control variable by an LQR design. Even implementing the result of [17]
and more general cost functions (most importantly, does not through a model predictive approach would require more
assume the linear dependence on the control sequence). In computations of at least an order of the planning horizon
fact, our simulations are performed for a car-like robot with (from O(K) to O(K 2 )). In such an implementation, the
nonlinear dependence on the control variables. online computation of the approach of [17] is O( Kn2x )
Feedback control: The proposed approach aims at design- compared to only O(n2x ) in our algorithm.
ing an LQR controller with an optimal nominal underlying V. E XAMPLE
trajectory based on the decoupling principle of Corollary 1
and Theorem 1. As a result, we term this method as the Let us consider a car-like four-wheel robot with process
Trajectory-optimized LQR (T-LQR). Although we utilize an model [26]:
LQR controller, it is important to note that the decoupling v
ẋ = v cos(θ), ẏ = v sin(θ), θ̇ = tan(φ), (22)
result only assumes a linear form of feedback and other types L
of designs [25] can be used as well. where (x, y, θ) is the state, and (v, φ) is the control input.
Remark: The computation involved in problem 2 is of the We suppose that, |φ| < φmax = π/2, |v| ≤ vmax = 0.6,
order of O(Kn2x ) for typically smooth dynamics for one x0 = (−1.5, 0.5, 0), K = 20, and the time discretization
iteration. Let us assume O( ) is the order of the number period is 0.7. We incorporate the control constraints and
of iterations in the optimizer until convergence. The LQR the terminal goal, xg = (−0.5, 1, 0), in the cost function.
policy calculation is of order of O(Kn3x ). Therefore, T- Last, the initial control sequence used for the optimization
LQR’s computations are O( Kn2x + Kn3x ) for a typical is just a sequence of zero inputs. The process noise is
process model (such as our example in the next section). additive mean zero Gaussian noise with a standard deviation
The low computational complexity of this approach results equal to  maxt {||ut ||2 }. Figure 1a shows the result of the
in fast replanning in case of deviations during execution. This optimization problem 2 whereas Fig. 1b shows a typical
renders the T-LQR scheme eminently implementable for use ground truth trajectory with  = 0.1. We have used MATLAB
in on-line applications. 2016b and its fmincon solver for simulations.
Remark: For the specific class of problems considered in In the next experiment, we increase  from 0.001 to
[17] the design approach of [17] requires calculation of the 0.1501, in step sizes of 0.001. For each value of , we execute

5
the resulting policy 100 times and compute the average [2] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas,
Normalized Mean Squared Error (NMSE) as: Dynamic programming and optimal control. Athena Scientific
Belmont, MA, 1995, vol. 1, no. 2.
100 [3] H. Kushner and P. G. Dupuis, Numerical methods for stochastic
1  ||xp − xj ||22 control problems in continuous time. Springer Science & Business
Average NMSE (%) = × 100, (23)
100 j=1 ||xp ||22 Media, 2013, vol. 24.
[4] R. Bellman, Dynamic Programming, 1st ed. Princeton, NJ, USA:
Princeton University Press, 1957.
where xp indicates the planned trajectory and xj indicates [5] C.-S. Chow and J. N. Tsitsiklis, “The complexity of dynamic pro-
the ground truth trajectory at jth experiment. The results of gramming,” Journal of complexity, vol. 5, no. 4, pp. 466–488, 1989.
[6] D. Mayne, “Robust and stochastic mpc: Are we going in the right
this experiment are shown in Fig. 2a, where the evolution direction?” IFAC-PapersOnLine, vol. 48, no. 23, pp. 1–8, 2015.
of the average NMSE is depicted for various values of [7] D. Q. Mayne, “Model predictive control: Recent developments and
noise level . As indicated in this figure, as  ↓ 0, the future promise,” Automatica, vol. 50, no. 12, pp. 2967–2986, 2014.
[8] J. N. Tsitsiklis, “Computational complexity in markov decision the-
average NMSE tends to zero at an exponential rate, which is ory,” HERMIS-An International Journal of Computer Mathematics and
consistent with the theory developed in Section II. Moreover, its Applications, vol. 9, pp. 45–54, 2007.
this figure indicates that through the feedback compensation, [9] Y. Le Tallec, “Robust, risk-sensitive, and data-driven control of markov
decision processes,” Ph.D. dissertation, Massachusetts Institute of
moderate noise levels can be tolerated, rather than just small Technology, 2007.
levels. [10] R. E. Kopp, “Pontryagin maximum principle,” Mathematics in Science
Last, Fig. 2b depicts the evolution of the average NMSE and Engineering, vol. 5, pp. 255–279, 1962.
[11] D. H. Jacobson and D. Q. Mayne, “Differential dynamic program-
for an experiment with the same setting as in Fig. 2a, except ming,” 1970.
that only the open-loop planned control sequence is applied [12] E. Theodorou, Y. Tassa, and E. Todorov, “Stochastic differential dy-
during execution. As predicted by the theory, the error namic programming,” in American Control Conference (ACC), 2010.
IEEE, 2010, pp. 1125–1132.
still decreases exponentially as the noise level decreases. [13] E. Todorov and W. Li, “A generalized iterative lqg method for locally-
However, the rate of convergence is about one-fifth of the optimal feedback control of constrained nonlinear stochastic systems,”
previous rate. The results of Fig. 2 show that our design can in American Control Conference, 2005. Proceedings of the 2005.
IEEE, 2005, pp. 300–306.
be used for relatively moderate levels of noise, using the [14] M. I. Freidlin and A. D. Wentzell, Random Perturbations. New York,
power of feedback. NY: Springer US, 1984, pp. 15–43.
Remark: In practice, if at any point in the execution the [15] A. D. Wentzell, Limit theorems on large deviations for Markov
stochastic processes. Springer Science & Business Media, 2012,
calculated error exceeds a threshold, very rapid replanning vol. 38.
can be triggered very fast due to the low computational [16] A. Dembo and O. Zeitouni, Large deviations techniques and applica-
burden of the optimization problem. tions. Springer Science & Business Media, 2009, vol. 38.
[17] W. H. Fleming, “Stochastic control for small noise intensities,” SIAM
Journal on Control, vol. 9, no. 3, pp. 473–517, 1971.
VI. C ONCLUSION [18] H. Cruz-Suárez and R. Ilhuicatzi-Roldán, “Stochastic optimal control
for small noise intensities: The discrete-time case,” WSEAS Trans.
We have presented a design approach that decouples the Math., vol. 9, no. 2, pp. 120–129, Feb. 2010.
design of the open-loop nominal trajectory and the closed- [19] J. D. Perkins and R. W. H. Sargent, Nonlinear optimal stochastic
loop feedback policy for fully-observed nonlinear stochas- control — some approximations when the noise is small. Berlin,
Heidelberg: Springer Berlin Heidelberg, 1976, pp. 820–830.
tic systems with Gaussian distributions. Our research has [20] J. Perkins and R. Sargent, “Nonlinear optimal stochastic controlsome
extended this result into a multi-agent setting, where the approximations when the noise is small,” in IFIP Technical Conference
approach also decouples the design of the stochastic part on Optimization Techniques. Springer, 1975, pp. 820–830.
[21] C. J. Holland, “An approximation technique for small noise open-loop
resulting in decentralized linear control laws. This work control problems,” Optimal Control Applications and Methods, vol. 2,
will be addressed in another paper. In this paper, we have no. 1.
shown that under a small-noise assumption, the stochastic [22] S. S. Varadhan and S. S. Varadhan, Large deviations and applications.
SIAM, 1984, vol. 46.
cost function is dominated by the nominal part of the cost [23] cardinal (https://math.stackexchange.com/users/7003/cardinal), “Proof
function and the expected first order linearization error is of of upper-tail inequality for standard normal distribution,” Mathematics
mean zero. This results in a tractable rapid planning method Stack Exchange, uRL:https://math.stackexchange.com/q/28754 (ver-
sion: 2011-03-24). [Online]. Available: https://math.stackexchange.
that is provably near-optimal. It can be used in robotic path com/q/28754
planning and control, and potentially in other applications. [24] D. Bertsekas, Dynamic Programming and Optimal Control: 3rd Ed.
Athena Scientific, 2007.
R EFERENCES [25] P. Kumar et al., “Control: a perspective,” Automatica, vol. 50, no. 1,
pp. 3–43, 2014.
[1] P. R. Kumar and P. P. Varaiya, Stochastic Systems: Estimation, [26] S. Lavalle, Planning algorithms. Cambridge University Press, 2006.
Identification, and Adaptive Control. Englewood Cliffs, NJ: Prentice-
Hall, 1986.

You might also like