Professional Documents
Culture Documents
MODEL PREDICTIVE
CONTROL
Manfred Morari
Jay H. Lee
Carlos E. Garca
February 8, 2002
Chapter 2
kT < t (k + 1)T
2.1
v1 (0) = {1, 1, . . . , 1, . . .}
1
Throughout this book, we will adopt the convention that the input sequences are zero
for negative time, i.e., for the example here v(k) = 0, for k < 0.
February 8, 2002
and v2 (0) a pulse.
v2 (0) = {1, 0, . . . , 0, . . .}
Let us assume that for a particular system P these two input sequences give
rise to the output sequences
y1 (0)
v2 (0)
y2 (0)
and
y3 (0) = y1 (0)
and that the response to a sum of two input sequences is the sum of the two
output sequences
v1 (0) + v2 (0) = v4 (0)
for arbitrary choices of v1 (0), v2 (0) and < < +, then the system P
is called linear.
Linearity is a very useful property. It has the following implication: If
we know that the input sequences v1 (0), v2 (0), v3 (0), etc. yield the output
sequences y1 (0), y2 (0), y3 (0) etc., respectively, then the response to any linear
combination of input sequences is simply a linear combination of the output
sequences. We also say that the process output can be obtained by superposition.
1 v1 (0)+2 v2 (0)+. . .+k vk (0)+. . .
v1 (`) =
| {z }
`
where the first ` elements are zero. (Here we have simply moved the signal
v1 (0) to the right by ` time steps). If the resulting system output sequence is
shifted in the same manner
y1 (`) =
v(`)
y(`)
holds for arbitrary input sequences v(0) and arbitrary integers ` then the
system is referred to as time-invariant. Time invariance implies, for example,
that if tomorrow the same input sequence is applied to the plant as today,
then the same output sequence will result. Throughout this part of the book
we will assume that linear time-invariant models are adequate for the systems
we are trying to describe and control.
2.2
System Stability
Let us perturb the input v to a system in a step-wise fashion and observe the
output y. We can distinguish the three types of system behavior depicted in
Fig. 2.2.
A stable system settles to a steady state after an input perturbation.
Asymptotically the output does not change. An unstable system does not
settle, its output continues to change. The output of an integrating system
approaches a ramp asymptotically; it changes in a linear fashion. The output
of an exponentially unstable system changes in a exponential manner.
February 8, 2002
DO!
Figure 2.2: Output responses to a step input change for stable, integrating
and unstable systems.
Most processes encountered in the process industries are stable. An important example of an integrating process is shown in Fig. 2.3B.
At the outlet of the tank a controller is installed which delivers a constant
flow. If a step change occurs in the inflow, the level will rise in a ramp-like
fashion. (Or in the case of a pulse change shown, the level changes to a different
value on a permanent basis.)
Some chemical reactors are exponentially unstable when left uncontrolled.
However, for safety reasons, reactors are usually designed to be stable so that
there is no danger of runaway when the control system fails. The control
concepts discussed in this part of the book are not applicable to exponentially
unstable systems. This system type will be dealt with in the advanced part.
2.3
FC
(2.1)
February 8, 2002
can be represented as a sum of impulses
v(0) =
{1, 0, 0, . . .} v(0)
+ {0, 1, 0, . . .} v(1)
+ {0, 0, 1, 0 . . .} v(2)
+...
{0, h1 , h2 . . . , hn , 0, 0, . . .} v(0)
+ {0, 0, h1 , h2 , . . . , hn , 0, 0, . . .} v(1)
+ {0, 0, 0, h1 , h2 , . . . , hn , 0, 0, . . .} v(2)
+...
2.4
= {0, s1 , s2 , s3 , . . .}
(2.3)
(2.4)
(2.5)
{1, 1, 1, 1, . . .}v(0)
+{0, 1, 1, 1, . . .}(v(1) v(0))
+{0, 0, 1, 1, . . .}(v(2) v(1))
+...
(2.6)
from one time step to the next. The system output is obtained by summing
the step responses weighted by their respective step heights v(i)
y(0) =
{0, s1 , . . . , sn , sn , sn , . . .} v(0)
+ {0, 0, s1 , . . . , sn , sn , sn , . . .} v(1)
+ {0, 0, 0, s1 , . . . , sn , sn , sn , . . .} v(2)
+...
v(1)
n1
X
(2.7)
i=1
k
X
hi
(2.8)
i=1
hk = sk sk1
(2.9)
February 8, 2002
2.5
Multi-Step Prediction
The objective of the controller is to determine the control action such that
a desirable output behavior results in the future. Thus we need to be able
to predict efficiently the future output behavior of the system. This future
behavior will be a function of past inputs to the process and future inputs,
i.e., inputs which we are considering to take in the future. We will separate the
effects of the past inputs and the future inputs on the output. All past input
information will be summarized in the dynamic state of the system. Thus the
future output behavior will be determined by the present system state and the
present and future inputs to the system.
Usually the representation of the state is not unique. For example, for an
FIR system we could choose the past n inputs as the state x.
x(k) = [v(k 1), v(k 2), . . . , v(k n)]T
(2.10)
Clearly this state summarizes all relevant past input information for an FIR
system and allows us to compute the future evolution of the system when we
are given the present input v(k) and the future inputs v(k + 1), v(k + 2), . . ..
There are other possible choices. For example, instead of the n past inputs
we could choose the effect of the past inputs on the future outputs at the next
n steps as the state. In other words, we define as the state the present output
and the n 1 future outputs assuming that the present and future inputs
are zero. The two states are equivalent in that they both have dimension n
and are related uniquely by a linear map.
The latter choice of state will prove more convenient for predictive control
computations. It shows explicitly how the system will evolve when there is no
control action and therefore allows us to determine easily what control action
should be taken to achieve a specified behavior of the outputs in the future.
In the next sections we will discuss how to do multi-step prediction for FIR
and step-response models based on this state representation.
2.5.1
T
Y (k) = y0 (k), y1 (k), . . . , yn1 (k)
where
(2.11)
(2.12)
Thus we have defined the ith system state yi (k) as the system output at time
k + i under the assumption the system inputs are zero from time k into the
future (v(k + j) = 0; j 0). This state completely characterizes the evolution of the system output under the assumption that the present and future
10
inputs are zero. In order to determine the future output we simply add to the
state the effect of the present and future inputs using (2.2).
y(k + 1) = y1 (k) + h1 v(k)
(2.13)
(2.14)
(2.15)
y(k + 4) = . . .
(2.16)
y(k + 1)
y1 (k)
y(k + 2)
y2 (k)
..
..
=
.
..
.
..
.
y(k + p)
yp (k)
| {z }
0
from Y (k)
effect of past inputs
0
0
h1
0
h1
h2
..
..
..
+ . v(k) + . v(k + 1) + . . . . . . + .
v(k + p 1)
..
..
..
.
.
.
h1
hp1
hp
|
{z
}
effect of future inputs (yet to be determined)
and note that the first term is a part of the state and reflects the effect of
the past inputs. The other terms express the effect of the hypothesized future
inputs. They are simply the responses to impulses occurring at the future time
steps.
In order to obtain the state at k + 1, which according to the definition is
Y (k + 1) =
with
(2.17)
(2.18)
we need to add the effect of the input v(k) at time k to the state Y (k).
y0 (k + 1) = y1 (k) + h1 v(k)
(2.19)
y1 (k + 1) = y2 (k) + h2 v(k)
(2.20)
...
yn1 (k + 1) = yn (k) + hn v(k)
(2.21)
(2.22)
We note that yn (k) was not a part of the state at time k, but we know it to
11
February 8, 2002
be 0 because of the FIR assumption. By defining the matrix
... ... 0 0
0 . . . 0 0
..
.
0 . . . . . . . . . 0 1
0
0
.
.
M=
.
0
0
1
0
0
1
(2.23)
Y (k + 1) = M Y (k) + Hv(k).
(2.24)
Multiplication with the matrix M in the above represents the simple operation
of shifting the vector Y (k) and setting the last element of the resulting vector
to zero.
2.5.2
(2.25)
(2.26)
Y (k) =
where
Thus, in this case, we have defined the state yi (k) as the system output at time
k + i under the assumption the input changes are zero from time k into the
future (v(k + i) = 0; i 0). Note that because of the FIR assumption the
step response settles after n steps, i.e., yk+n1 (k) = yk+n (k) = . . . = y (k).
Hence, the choice of state Y (k) completely characterizes the evolution of the
system output under the assumption that the present and future input changes
are zero. In order to determine the future output we simply add to the state
the effect of the present and future input changes. From (2.7) we find
y(k + 1) =
n1
X
(2.27)
i=1
= y1 (k) + s1 v(k).
(2.28)
(2.29)
(2.30)
y(k + 4) = . . .
(2.31)
12
y(k + 1)
y(k + 2)
..
=
..
.
y(k + p)
in matrix form
y1 (k)
y2 (k)
..
.
..
.
yp (k)
| {z }
Y (k)
+ current bias from
effect of past
inputs
0
0
s1
0
s1
s2
..
..
..
+
. v(k) + . v(k + 1) + . . . . . . + . v(k + p 1)
..
..
..
.
.
.
s1
sp1
sp
|
{z
}
effect of future input changes (yet to be determined)
(2.32)
and note that the first term is a part of the state and reflects the effect of
the past inputs. The other terms express the effect of the hypothesized future
input changes. They are simply the reponses to steps occurring at the future
time steps.
In order to obtain the state at k + 1, which according to the definition is
T
Y (k + 1) = y0 (k + 1), y1 (k + 1), . . . , yn1 (k + 1)
with (2.33)
(2.34)
we need to add the effect of the input change v(k) at time k to the state
Y (k).
y0 (k + 1) = y1 (k) + s1 v(k)
(2.35)
y1 (k + 1) = y2 (k) + s2 v(k)
(2.36)
...
y2 (k + 1) = yn (k) + sn v(k)
We note that yn (k) =
tildeyn1 (k) because of the
0
0
.
.
M=
.
0
0
1 0 ... ... 0 0
0 1
0 . . . 0 0
..
n
.
(2.37)
(2.38)
(2.39)
Y (k + 1) = M Y (k) + Sv(k).
(2.40)
13
February 8, 2002
Multiplication with the matrix M denotes the operation of shifting the vector
Y (k) and repeating the last element. The recursive relation (2.40) is referred
to as the step response model of the system.
As is apparent from the derivation, the FIR and the step response models
are very similar. The definition of the state is slightly different. In the FIR
model the future inputs are assumed to be zero, in the step response model the
future input changes are kept zero. Also the input representation is different.
For the FIR model the future inputs are given in terms of pulses, for the step
response model the future inputs are steps. Because the step response model
expresses the future inputs in terms of changes v, it will be very convenient
for incorporating integral action in the controller as we will show.
2.5.3
Multivariable Generalization
The model equation (2.40) generalizes readily to the case when the system has
ny outputs y`, ` = 1, . . . , ny and nv inputs vj, j = 1, . . . , nv . We define the
output vector
y(k 1) = [y1(k 1), . . . , yny (k 1)]T ,
input vector v(k 1) = [v1(k 1), . . . , vnv (k 1)]T and step response
coefficient matrix
Si
s
1,1,i
s2,1,i
.
..
sny ,1,i
s1,nv ,i
s1,2,i
...
sny ,2,i
. . . sny ,nv ,i
where s`,m,i is the ith step response coefficient relating the mth input to the
`th output. The (ny n) nu step response matrix is obtained by stacking up
the step response coefficient matrices
T
S = S1T , S2T , S3T , . . . , SnT .
The state of the multiple output system at time k is defined as
Y (k) =
with
(2.41)
(2.42)
14
Here M is defined as
0 I
0 0
.
..
0 0
0 0
..
. . . . . . . . . 0 I
... ... 0
0 ... 0
2.6
Examples
15
February 8, 2002
1.6
1.4
5
1.2
4
1
3
0.8
0.6
2
0.4
1
0.2
0
0
100
200
300
100
TIME
200
300
TIME
Figure 2.4: Responses of side endpoint for step change in side draw (left) and
intermediate reflux duty (right). (n = 35, T = 7)
2.5
1.5
0.5
-0.5
10
20
30
40
50
60
70
80
TIME
16
u2 step response : y1
0.5
0.5
0
0
100
200
300
0
0
200
TIME
u1 step response : y2
u2 step response : y2
0.5
0.5
0
0
100
TIME
100
200
TIME
300
0
0
100
200
300
300
TIME
Figure 2.6: Response of bottom and top composition of a high purity distillation column to step changes in boilup and reflux (n = 30, T = 10).
2.7
Identification
Two approaches are available for obtaining the models needed for prediction
and control move computation. One can derive the differential equations representing the various material, energy and momentum balances and solve these
equations numerically. Thus one can simulate the response of the system to a
step input change and obtain a step response model. This fundamental modeling approach is quite complex and requires much engineering effort, because
the physicochemical phenomena have to be understood and all the process
parameters have to be known or determined from experiments. The main advantage of the procedure is that the resulting differentialequation models are
usually valid over wide ranges of operating conditions.
The second approach to modeling is to perturb the inputs of the real process and record the output responses. By relating the inputs and outputs a
process model can be derived. This approach is referred to as process identification. Especially in situations when the process is complex and the fundamental
phenomena are not well understood, experimental process identification is the
preferred modeling approach in industry.
In this section, we will discuss the direct identification of an FIR model.
The primary advantage of fitting an FIR model is that the only parameters
to be specified by the user are the sampling time and the response length n.
This is in contrast with other techniques which use more structured models
and thus require that a structure identification step be performed first.
17
February 8, 2002
2.7.1
Settling Time
Very few physical processes are actually FIR but the step response coefficients
become essentially constant after some time. Thus, we have to determine by
inspection a time beyond which the step response coefficients do not change
appreciably and define an appropriate truncated step response model which is
FIR. If we choose this time too short the model is in error and the performance
of a controller based on this model will suffer. On the other hand, if we choose
this time too long the model will include many step response coefficients, which
makes the prediction and control computation unwieldy.
2.7.2
Sampling Time
18
2.7.3
The simplest test is to step up or step down the inputs to obtain the step response coefficients. When one tries to determine the impulse response directly
from step experiments one is faced with several experimental problems.
At the beginning of the experiment the plant has to be at steady state
which is often difficult to accomplish because of disturbances over which the
experimenter has no control. A typical response starting from a nonzero
initial condition is shown in Fig. 2.8 B. Disturbances can also occur during
the experiment leading to responses like that in Fig. 2.8 C.
Finally, there might be so much noise that the step response is difficult
to recognize Fig. 2.8 D. It is then necessary to increase the magnitude of the
input step and thus to increase the signalnoise ratio of the output signal.
Large steps, however, are frowned upon by the operating personnel, who are
worried about, for example, offspec products when the operating conditions
are disturbed too much.
In the presence of significant noise, disturbances and nonsteady state initial conditions, it is better to choose input signals that are more random in
nature. This idea of randomness of signals will be made more precise in the
advanced section of the book. Input signals that are more random should
help reduce the adverse effects due to nonsteady state initial conditions, disturbances and noise. However, we recommend any engineer to try to perform
a step test first since it is the easiest experiment.
19
February 8, 2002
Figure 2.8: (A) True step response. (B) Erroneous step responses caused
by nonsteady state initial conditions and (C) unmeasured disturbances, (D)
measurement noise.
2.7.4
In order to explain the basic technique to identify an FIR model for arbitrary
input signals, we need to introduce some basic concepts of linear algebra. Let
us assume that the following system of linear equations is given:
b = Ax
(2.43)
where the matrix A has more rows than columns. That is, there exist more
equations than there are unknowns, the linear system is overspecified. Let us
also assume that A has full column rank. This means that the rank of A is
equal to the dimension of the solution vector x.
This system of equations has no exact solution because there are not
enough degrees of freedom to satisfy all equations simultaneously. However,
one can find a solution that makes the left hand and right hand sides close
to each other. One way is to find a vector x that minimizes the sum of squares
of the equation residuals. Let us denote the vector of residuals as
20
= b Ax
(2.44)
(2.45)
(2.47)
d2 (T )
= 2AT A
(2.48)
dx2
is positive definite because of the rank condition of A, and thus the solution
of (2.47) for x
x = (AT A)1 AT b
(2.49)
(2.50)
Note that formulas (2.49) and (2.50) are algebraically correct, but are
never used for numerical computations. Most available software employs the
QR algorithm for determining the least squares solution. The reader is referred
to the appropriate numerical analysis literature for more information.
2.7.5
(2.46)
21
February 8, 2002
a model relating one output variable of a process to multiple inputs (the same
algorithm is used for all outputs from the same test data). When many inputs
vary simultaneously they must be uncorrelated so that the individual models
can be obtained. Directly fitting an FIR model provides many advantages to an
inexperienced user. For example, apart from the settling time of the process,
it requires essentially no other a priori information about the process. Here we
give a brief derivation of the least squares identification technique and some
modifications to overcome the drawbacks of directly fitting an FIR model.
Let us assume the process has a single output y and a set of inputs vj, j =
1, . . . , nv . Let us also assume that the process is described by an impulse
response model as follows:
y(k) =
nv X
n
X
hj,i vj(k i) + (k 1)
(2.51)
j=1 i=1
nv X
n
X
hj,i vj(k i) + (k 1)
(2.52)
j=1 i=1
(This model can be obtained directly by writing (2.51) for the times k and
k 1 and differencing.) We assume that is independent of, or uncorrelated
to v.
If we collect the values of the output and inputs over N +n intervals of time
from an on-line experiment we can estimate the impulse response coefficients
of the model as follows. We can write (2.52) over the past N intervals of time
up to the current time N + n:
y(n + 1)
y(n + 2)
=
..
y(n + N )
{z
}
|
YN
22
v1 (n)
v1(1)
v2(n)
vnv (1)
v1 (n + 1)
v1(2)
v2(n 1)
vnv (2)
..
..
.
.
h1,1
..
.
h1,n
h2,1
..
.
hnv ,n
| {z }
(n + 1)
(n + 2)
+
..
(n + N )
{z
}
|
VN
(2.53)
(2.54)
(2.55)
where YN is the vector of past output measurements, N is the matrix containing all past input measurements, is the vector of parameters to be identified. If the number of data points N is larger than the total number of
parameters in (nv n), the following formula for the least squares estimate
of for N data points, that is the choice of that minimizes the quantity
(YN N )T (YN N ), can be found from (2.49):
N = [N T N ]1 N T YN (k).
(2.56)
(2.57)
23
February 8, 2002
The solution follows from:
N = [N T T N ]1 N T T YN (k).
(2.58)
It can be shown that if the underlying system is truly linear and n is large
enough so that and v are uncorrelated, this estimate is unbiased (i.e., it
is expected to be right or right on the average). Also, the estimate converges
to the true value as the number of data points N becomes large, under some
mild condition. Reliable software exists to obtain the least squares estimates
of the parameters .
A major drawback of this approach is that a large number of data points
needs to be collected because of the many parameters to be fitted. This is
required because in the presence of noise the variance of the parameters could
be so large as to render the fit useless. Often, the resulting step response will
be non-smooth with many sharp peaks. One simple approach to alleviate this
difficulty is to add a penalty term on the magnitude of the changes of the step
response coefficients, i.e. the impulse response coefficients, to be identified.
In other words, is found such that the quantity (YN N )T T (YN
N ) + T T is minimized where is a weighting matrix penalizing the
magnitudes of the impulse response coefficients. In other words, the weight
penalizes sharp changes in the step response coefficients. As before, this can
be formulated as a least-squares problem
YN
N
=
(2.59)
0
(2.60)
This simple modification to the standard least-squares identification algorithm should result in smoother step responses. One drawback of the method
is that the optimal choice of the weighting matrix is often unclear. Choosing too large a can lead to severely biased estimates even with large data
sets. On the other hand, too small a choice of may not smooth the step
response sufficiently. Other more sophisticated statistical methods to reduce
the error variance of the parameters will be discussed in the advanced part of
this book.
The procedure above can also be used to fit measured disturbance models to be used for feedforward compensation in MPC. Designing the manipulated inputs such that they are uncorrelated with the disturbance should
minimize problems when fitting disturbance models. Of course, the measured
disturbance must also have enough natural excitation, which may be hard to
guarantee.
24
Chapter 3
3.1
Consider the diagram in Fig. 3.1. At the present time k the behavior of the
process over a horizon p is considered. Using the model, the response of the
process output to changes in the manipulated variable is predicted. Current
and future moves of the manipulated variables are selected such that the predicted response has certain desirable (or optimal) characteristics. For instance,
a commonly used objective is to minimize the sum of squares of the future
errors, i.e., the deviations of the controlled variable from a desired target (setpoint). This minimization can also take into account constraints which may
be present on the manipulated variables and the outputs.
The idea is appealing but would not work very well in practice if the moves
of the manipulated variable determined at time k were applied blindly over
the future horizon. Disturbances and modelling errors may lead to deviations
between the predicted behavior and the actual observed behavior, so that
the computed manipulated variable moves may not be appropriate any more.
Therefore only the first one of the computed moves is actually implemented. At
the next time step k + 1 a measurement is taken, the horizon is shifted forward
by one step, and the optimization is done again over this shifted horizon based
on the current system information. Therefore this control strategy is also
referred to as moving horizon control.
A similar strategy is used in many other non-technical situations. One
25
26
example is computer chess where the computer moves after evaluating all
possible moves over a specified depth (the horizon). At the next turn the
evaluation is repeated based on the current board situation. Another example
would be investment planning. A five-year plan is established to maximize the
return. Periodically a new five year plan is put together over a shifted horizon
to take into account changes which have occurred in the economy.
The DMC algorithm includes as one of its major components a technique
to predict the future output of the system as a function of the inputs and
disturbances. This prediction capability is necessary to determine the optimal
future control inputs and was outlined in the previous chapter. Afterwards
we will state the objective function, formulate the optimization problem and
comment on its solution. Finally we will discuss the various tuning parameters
which are available to the user to affect the performance of the controller.
27
February 8, 2002
+
+
Figure 3.2: Basic Problem Setup
3.2
Multi-Step Prediction
We consider the setup depicted in Fig. 3.2 where we have three different types
of external inputs: the manipulated variable (MV) u, whose effect on the
output, usually a controlled variable (CV), is described by Pu ; the measured
disturbance variable (DV) d whose effect on the output is described by Pd ;
and finally the unmeasured and unmodeled disturbances wy which add a bias
to the system output. The overall system can be described by
u(k)
u
d
P
+ wy (k)
(3.1)
y(k) = P
d(k)
We assume that step response models S u , S d are available for the system
dynamics Pu and Pd , respectively. We can define the overall multivariable
step response model
S = Su Sd
(3.2)
which is driven by the known overall input
u(k)
v(k) =
.
d(k)
T
Y (k) = y0 (k), y1 (k), . . . , yn1 (k)
y(k)
y(k + 1)
Y (k) =
..
.
y(k + n 1)
(3.3)
(3.4)
(3.5)
28
obtained under the assumption that the system inputs do not change from the
previous values, i.e.,
u(k) = u(k + 1) = = 0
d(k) = d(k + 1) = = 0.
(3.6)
Also, the state does not include any unmeasureed disturbance information and
hence it is assumed in the definition that
wy (k) = wy (k + 1) = = 0
(3.7)
(3.8)
The equation reflects the effect of the input change v(k 1) on the future
evolution of the system assuming that there are no further input changes.
The influence of the input change manifests itself through the step response
matrix S. The effect of any future input changes is described as well by the
appropriate step response matrix. Let us consider the predicted output over
29
February 8, 2002
the next p time steps
y(k + 1|k)
y(k + 2|k)
..
.
..
.
y(k + p|k)
u
S1
y1 (k)
y2 (k) S u
2
.. ..
= . + . u(k|k)
.. ..
. .
Spu
y (k)
p
0
0
S1u
0
.
0
u
S1u
Sp1
d
S1
0
Sd
S1d
2
..
S2d
d(k) +
.
+
d(k + 1|k) +
..
..
.
.
d
d
Sp1
Sp
0
0
0
S1d
wy (k + 1|k)
wy (k + 2|k)
.
.
.
+
..
.
wy (k + p|k)
u(k + p 1|k)
(3.9)
Here the first term on the right hand side, the first p elements of the state,
describes the future evolution of the system when all the future input changes
are zero. The remaining terms describe the effect of the present and future
changes of the manipulated inputs u(k + i|k), the measured disturbances
d(k + i|k), and the unmeasured and unmodeled disturbances wy (k + i|k).
The notation y(k + i|k) represents the prediction of y(k + i) made based on
the information available at time k. The same notation applies to d and wy .
The values of most of these variables are not available at time k and have
to be predicted in a rational fashion. From the measurement at time k d(k) is
known and therefore d(k) = d(k) d(k 1). Unless some additional process
information or upstream measurements are available to conclude about the
future disturbance behavior, the disturbances are assumed not to change in
30
(3.10)
This assumption is reasonable when the disturbances are varying only infrequently. Similarly, we will assume that the future unmodeled disturbances
wy (k + i|k) do not change.
(3.11)
(3.12)
where ym (k) represents the value of the output as actually measured in the
plant. Here y0 (k), the first component of the state Y (k), is the model prediction of the output at time k (assuming wy (k) = 0) based on the information
up to this time. The difference between this predicted output and the measurement provides a good estimate of the unmodeled disturbance.
For generality we want to consider the case where the manipulated inputs are not varied over the whole horizon p but only over the next m steps
(u(k|k), u(k + 1|k), , u(k + m 1|k)) and that the input changes are
set to zero after that.
(3.13)
31
February 8, 2002
With these assumptions (3.9) becomes
y1 (k)
y2 (k)
..
.
Y(k + 1|k) =
..
.
yp (k)
| {z }
MY (k)
from the memory
S1
ym (k) y0 (k)
Sd
ym (k) y0 (k)
..
..
.
+ . d(k) +
..
..
.
d
Sp
ym (k) y0 (k)
|
{z
}
{z
}
|
Ip (ym (k) y0 (k))
S d d(k)
feedback term
feedforward term
S1
0
0
u(k|k)
u
Su
S1
0
0
u(k + 1|k)
..
..
..
.
.
.
.
.
.
.
.
.
.
.
+
u
u
u
Sm
Sm1
S1
..
..
..
..
.. ..
.
.
.
.
.
.
u(k + m 1|k)
u
u
Spu Sp1
Spm+1
|
{z
}
|
{z
}
U(k)
Su
future input moves
dynamic matrix
(3.14)
y(k + 1|k)
y(k + 2|k)
Y(k + 1|k) =
..
(3.15)
y(k + p|k)
S1u
Su
2
..
.
u
S =
u
Sm
..
.
Spu
0
S1u
..
.
...
...
0
0
..
.
u
Sm1
..
.
...
S1u
..
.
u
Sp1
u
. . . Spm+1
32
Sd
1
S2d
Sd =
...
(3.16)
Spd
Ip
M=
U(k) =
0
0
k .
..
0
0
0
.
.
.
0
..
.
0
I
0
..
.
I
0
..
.
..
.
0
I
..
.
...
0
..
.
0
0 ...
I ...
.. ..
.
.
0
I
. p
..
u(k|k)
u(k + 1|k)
..
.
(3.17)
u(k + m 1|k)
..
.. p for p n
.. ..
.
. .
.
I
0
..
.
p for p n
I
..
(3.18)
(3.19)
(3.20)
where the first three terms are completely defined by past control actions
(Y (k), y0 (k)) and present measurements (ym (k), d(k)) and the last term
describes the effect of future manipulated variable moves U(k).
This prediction equation can be easily adjusted if different assumptions are
made on the future behavior of the measured and unmeasured disturbances.
For instance, if the disturbances are expected to evolve in a ramp-like fashion
then we would set
d(k) = d(k + 1|k) = = d(k + p 1|k)
(3.21)
(3.22)
and
33
February 8, 2002
3.3
Objective Function
u(k|k)...u(k+m1|k)
p
X
(3.23)
`=1
min
u(k|k)...u(k+m1|k)
p
X
(3.24)
`=1
For example, for a system with two outputs y1 and y2 , and constant diagonal
weight matrices of the form
y`
1
1 0
0 2
; `
(3.25)
34
u(k|k)...u(k+m1|k)
{ 1
2 2
p
X
`=1
p
X
(3.26)
`=1
Thus, the larger the weight is for a particular output, the larger is the contribution of its sum of squared deviations to the objective. This will make the
controller bring the corresponding output closer to its reference trajectory.
Finally, the manipulated variable moves that make the output follow a
given trajectory could be too severe to be acceptable in practice. This can be
corrected by adding a penalty term for the manipulated variable moves to the
objective as follows:
min
U (k)
p
X
`=1
m
X
(3.27)
`=1
Note that the larger the elements of the matrix u` are, the smaller the resulting
moves will be, and consequently, the output trajectories will not be followed
as closely. Thus, the relative magnitudes of y` and u` will determine the
trade-off between following the trajectory closely and reducing the action of
the manipulated variables.
Of course, not every practical performance criterion is faithfully represented by this quadratic objective. However, many control problems can be
formulated as trajectory tracking problems and therefore this formulation is
very useful. Most importantly this formulation leads to an optimization problem for which there exist effective solution techniques.
3.4
Constraints
35
February 8, 2002
3.4.1
The solution vector of DMC contains not only the current moves to be implemented but also the moves for the future m intervals of time. Although
violations can be avoided by constraining only the move to be implemented,
constraints on future moves can be used to allow the algorithm to anticipate
and prevent future violations thus producing a better overall response. The
manipulated variable value at a future time k + ` is constrained to be
ulow (`)
`
X
j=0
..
IL
u(k 1) uhigh (m 1)
(3.28)
U(k)
IL
ulow (0) u(k 1)
..
.
ulow (m 1) u(k 1)
where
IL =
I 0 0
I I 0
.. .. . . ..
. .
. .
I I I .
(3.29)
36
3.4.2
Often MPC is used in a supervisory mode where there are limitations on the
rate at which lower level controller setpoints are moved. These are enforced
by adding constraints on the manipulated variable move sizes:
umax (0)
..
umax (m 1)
I
(3.30)
U(k)
I
umax (0)
..
.
umax (m 1)
where umax (`) > 0 is the possibly time varying bound on the magnitude of
the moves.
3.4.3
The algorithm can make use of the output predictions (3.20) to anticipate
future constraint violations.
Ylow Y(k + 1|k) Yhigh
(3.31)
S u
Su
where
yhigh (1)
ylow (1)
yhigh (2)
ylow (2)
Ylow =
; Yhigh =
..
..
.
.
ylow (p)
yhigh (p)
are vectors of output constraint trajectories ylow (`), yhigh (`) over the horizon
length p.
3.4.4
Combined Constraints
The manipulated variable constraints (3.28), manipulated variable rate constraints (3.30) and output variable constraints (3.32) can be combined into
one convenient expression
C u U(k) C(k + 1|k)
(3.33)
37
February 8, 2002
where C u combines all the matrices on the left hand side of the inequalities as
follows:
IL
IL
I
u
(3.34)
C =
S u
Su
.
The vector C(k + 1|k) on the right hand side collects all the error vectors on
the constraint equations as follows:
u(k 1) uhigh (m 1)
..
ulow (m 1) u(k 1)
umax (0)
(3.35)
C(k + 1|k) =
..
u
(m
1)
max
umax (0)
..
umax (m 1)
d
MY (k) + S d(k) + Ip (ym (k) y0 (k)) Yhigh
(MY (k) + S d d(k) + Ip (ym (k) y0 (k))) + Ylow
3.5
3.5.1
min
x
s.t.
xT Hx g T x
Cx c
where
H is a symmetric matrix called the Hessian matrix;
(3.36)
38
(3.37)
= min xT AT Ax 2bT Ax + bT b
x
Thus the QP is
min xT AT Ax 2bT Ax
x
yielding
H = AT A
g = 2AT b.
In order to obtain the unique unconstrained solution
1
x = H 1 g
2
H must be positive definite, which is the same condition required in Section 2.7.4.
39
February 8, 2002
min
x
s.t.
Cx c
(3.38)
which is a Linear Programming (LP) problem. The solution of an LP will always lie at a constraint. This is not necessarily true of QP solutions. Although
not a requirement, more efficient QP algorithms are available for problems
with a positive definite H. For example, parametric QP algorithms employ
the preinverted Hessian in its computations, thus reducing the computational
requirements [?, ?].
3.5.2
U (k)
p
X
m
X
min
k [Y(k + 1|k) R(k + 1)] k2 + ku U(k)k2
U (k)
s.t.
(3.39)
`=1
`=1
(3.40)
(3.41)
where
and
u = diag {u1 , , um }
(3.42)
y = diag y1 , , yp
(3.43)
r(k + 1)
r(k + 2)
R(k + 1) =
..
(3.44)
r(k + p)
= U (k)(S
uT
yT
S +
yT
2Ep (k + 1|k)
uT
(3.45)
2
(3.46)
)U(k)
S U(k) +
EpT (k
(3.47)
+ 1|k)
yT
Ep (k + 1|k)
40
Ep (k + 1|k) =
e(k + 1|k)
e(k + 2|k)
..
.
(3.48)
e(k + p|k)
h
i
which is the measurement corrected vector of future output deviations from the
reference trajectory (i.e., errors), assuming that all future control moves are
zero. Note that this vector includes the effect of the measurable disturbances
(S d d(k)) on the prediction.
The optimization problem with a quadratic objective and linear inequalities, which we have defined is a Quadratic Program. By converting to the
standard QP formulation the DMC problem becomes2 :
min
U (k)
(3.49)
(3.50)
3.6
(3.51)
Implementation
The term EpT (k + 1|k)Ep (k + 1|k) is independent of U(k) and can be removed from the
objective function.
41
February 8, 2002
3.6.1
(3.52)
(3.53)
where the first element of Y (k), y(k|k), is the model prediction of the
output ym (k) at time k.
4. Obtain Measurements: Obtain measurements (ym (k), d(k)).
5. Compute the reference trajectory error vector
(3.55)
If (3.52) is used for intialization and changes in the past n inputs did actually occur,
then the initial operation of the algorithm will not be smooth. The transfer from manual to
automatic will introduce a disturbance; it will not be bumpless.
42
u(k 1) uhigh (m 1)
..
u
(m
1)
u(k 1)
low
umax (0)
C(k + 1|k) =
..
umax (m 1)
umax (0)
..
u
max (m 1)
(3.56)
8. Solve the QP
min
U(k)
1
T u
2 U(k) H U(k)
s.t. C u U(k)
(3.57)
3.6.2
Solving the QP
February 8, 2002
43
The KKT condition is a necessary condition for the solution to a general constrained
optimization problem. For QP, it is a necessary and sufficient condition.
44
3.6.3
0
..
.
0
0
S u
0
+1
u
S u
S
+1
+2
..
.
0
c(k + 1|k)
..
..
.
.
c(k + |k)
0
U(k)
c(k + + 1|k)
0
c(k + + 2|k)
..
..
.
.
45
February 8, 2002
y max
k
k+H c
Relax the constraints between
k+1 and k+H
-1
Positive elements c(k + 1|k), , c(k + |k) indicate that a violation is projected unless the manipulated variables are changed (U(k) 6= 0). Since the
corresponding coefficients in the left hand side matrix are zero, the inequalities
cannot be satisfied and the QP is infeasible. Of course, this problem can be
removed by simply not including these initial inequalities in the QP.
Because inequalities are dealt with exactly by the QP, the corrective action
against a projected violation is equivalent to that generated by a very tightly
tuned controller. As a result, the moves produced by the QP to correct for
violations may be undesirably severe (even when feasible). Both infeasibilities
and severe moves can be dealt with in various ways.
One way is to include is a constraint window on the output constraints
similar to what we suggested above for computational savings. For each output
a time k +Hc in the future is chosen at which constraint violations will start to
be checked (Fig. 3.3). For the illustration above, this time should be picked to
be at least equal to +1. This allows the algorithm to check for violations after
the effects of deadtimes and inverse responses have passed. For each situation
there is a minimal value of Hc necessary for feasibility. If this minimal value is
chosen large, constraint violations may occur over a significant period of time.
In many cases, if a larger value of Hc is chosen, smaller constraint violations
may occur over a longer time interval. Thus, there is a trade-off between
magnitude and duration of constraint violation.
In general, it is difficult to select a value of Hc for each constrained output
such that the proper compromise is achieved. Furthermore, in multivariable
cases, constraints may need to be relaxed according to the priorities of the constrained variables. The selection of constraint windows is greatly complicated
by the fact that appropriate amount and location for relaxation are usually
time-dependent due to varying disturbances and occurrences of actuator and
sensor failures. Therefore it is usually preferred to soften the constraint by
46
3.6.4
On one hand, the prediction horizon p and the control horizon m should be
kept short to reduce the computational effort; on the other hand, they should
be made long to prevent short-sighted control policies. Making m short is
generally conservative because we are imposing constraints (forcing the control
to be constant after m steps) which do not exist in the actual implementation
because of the moving horizon policy. Therefore a small m will tend to give
rise to a cautious control action.
Choosing p small is short-sighted and will generally lead to an aggressive
control action. If constraint violations are checked only over a small control
horizon p this policy may lead the system into a dead alley from which it can
escape only with difficulty, i.e., only with large constraint violations and/or
large manipulated variable moves.
When p and m are infinity and when there are no disturbance changes
and unknown inputs, the sequence of control moves determined at time k is
the same sequence which is realized through the moving horizon policy. In
this sense our control actions are truly optimal. When the horizon lengths are
shortened, then the sequence of moves determined by the optimizer and the
sequence of moves actually implemented on the system will become increasingly different. Thus the short time objective which is optimized will have less
and less to do with the actual value of the objective realized when the moving
horizon control is implemented. This may be undesirable.
47
February 8, 2002
k+m-1
k+m+n-1
N time steps
k+m-1
Figure 3.4: Choosing the horizon
In general, we should try to choose a small m to keep the computational
effort manageable, but large enough to give us a sufficient number of degrees of
freedom. We should choose p as large as possible, possibly , to completely
capture the consequences of the control actions. This is possible in several
ways. Because an FIR system will settle after m + n steps, choosing a horizon
p = m + n is a sensible choice used in many commercial systems (Fig. 3.4).
Instead or in addition we can impose a large output penalty at the end of the
prediction horizon forcing the system effectively to settle to zero at the end of
the horizon. Then, with p = m+n, the error after m+n is essentially zero and
there is little difference between the finite and the infinite horizon objective.
3.6.5
Input Blocking
As said, use of a large control horizon is generally preferred from the viewpoint
of performance but available computational resource may limit its size. One
way to relax this limit is through a procedure called Blocking, which allows
to the user to block out the input moves at selected locations from the
calculation by setting them to zero a priori. Result is a reduction in the
number of input moves that need to be computed through the optimization,
hopefully without a significant sacrifice in the solution quality. Obviously,
judicious selection of blocking locations is critical for achieving the intended
effect. The selection is done mostly on an ad hoc basis, though there are some
qualitative rules like blocking less of the immediate moves and more of the
distant ones.
48
(3.58)
3.6.6
3.7
Examples
INCLUDE!
Effect of constraints on response PC examples of slowdown with IMC
Stability-contrast with unconstrained case (EZ)
Example from MPC course demonstrating positive effect of constraint.
3.8
February 8, 2002
49
3.8.1
Reference Trajectories
In DMC, output deviation from the desired setpoint is penalized in the optimization. Other algorithms like IDCOM, HIECON, and PFC let the user
specify not only where the output should go but also how. For this, a reference
trajectory is introduced for each controlled variable (CV), which is typically
defined as a first-order path from the current output value to the desired setpoint. The time constant of the path can be adjusted according to the speed
of the desired closed-loop response. This is displayed in Figure ??.
Reference trajectories provide an intuitive way to control the aggressiveness of control, which would is adjusted through the weighting matrix for
the input move penalty term in DMC. One could argue that the controllers
aggressiveness is more conveniently tuned by specifying the speed of output
response rather than through input weight parameters, whose effects on the
speed of response is highly system-dependent.
3.8.2
Coincidence Points
Some commercial algorithms like IDCOM and PFC allowed the option of penalizing the output error only at a few chosen points in the prediction horizon
called coincidence points. This is motivated primarily by reduction in computation it brings. When the number of input moves has to be kept small
(in order to keep the computational burden low), use of a large prediction
horizon, which is sometimes necessary due to large inverse responses, and long
dynamics, results in a sluggish control behavior. This problem can be obviated
by penalizing output deviation only at a few carefully selected points. At the
50
extreme, one could ask the output to match the reference trajectory value at
a single time point, which can be achieved with a single control move. Such
formulation was used, for example, in IDCOM-M, an offspring of the original
IDCOM algorithm, marketed by Setpoint.
Clearly, the choice of coincidence points is critical for performance, especially when the number of points used is small. Though some guidelines exist
on choosing these points, there is no systematic method for the selection. Because the response time of different outputs can vary significantly, coincidence
points are usually defined separately for each output.
3.8.3
The RMPCT algorithm differs from other MPC algorithms in that it attempts
to keep each controlled output within a user-specified zone called funnel, rather
than to keep it on a specific reference trajectory. The typical shape of a funnel
is displayed in Figure ??. The user sets the maximum and minimum limits and
also the slope of the funnel through a parameter called performance ratio,
which is the desired time to return to the limit zone divided by the open-loop
response time. The gap between the maximum and minimum can be closed
for exact setpoint control, or left open for range control.
The algorithm solves the following quadratic program at each time:
min
r
y ,u
p
X
ky(k + i|k) y (k +
i|k)k2Q
i=1
m1
X
ku(k + j|k)k2R
(3.59)
(3.60)
j=0
or
min
r
y ,u
p
X
i=1
ky(k + i|k) y (k +
i|k)k2Q
m1
X
j=0
(3.61)
f
f
where ymin
(k+i|k) and ymin
(k+i|k) represent the upper and lower limit values
of the funnel for k + i in the prediction horizon as specified at time k. ur is the
desired settling value for the input. Notice that the reference trajectory y r is a
free parameter, which is optimized to lie within the funnel. Typically, Q R
in order to keep the outputs within the funnel as much as possible. Then one
can think of the above as a multi-objective optimization, in which the primary
objective is to minimize the funnel constraint violation by the output and
the secondary objective is to minimize the size of input movement (or input
deviation from the desired settling value in the case of (3.60)). In this case,
as long as there exists an input trajectory that keeps the output within the
February 8, 2002
51
INCLUDE!
Table 3.1: Optimization Resulting from Use of Different Spatial and Temporal
Norms
funnel, the first penalty term will be made exactly zero. Typically, there will
be an infinite number of solutions that achieve this, leading to a degenerate
QP. The algorithm thus finds the minimum norm solution, which corresponds
to the least amount of input adjustment hence the name Robust MPCT .
However, if there is no input that can keep the output within the funnel, the
first term will be the primary factor that determines the input.
The use of funnel is motivated by the fact that, in multivariable systems,
the shape of desirable trajectories for outputs are not always clear due to the
system interaction. Thus, it is argued that an attractive formulation is to let
the user specify an acceptable dynamic zone for each output as a funnel and
then find the minimum size input moves (or inputs with minimum deviation
from their desired values) that keep the outputs within the zone or, if not
possible, minimize the extent of violation.
3.8.4
In defining the objective function, use of norms other than 2-norm is certainly
possible. For example, the possibility of using 1-norm (sum of absolute values) has been explored to a great extent. Use of infinity norm has also been
investigated with the aim of minimizing worst-case deviation over time. In
both cases, one gets a Linear Program, for which plethora of theories and software exist due to its significance in economics. However, one difficulty with
these formulations is in tuning. This is because the solution of a LP lies at
the intersection of binding constraints and it can switch abruptly from one
vertex to another as one varies tuning parameters (such as the input weight
parameters). The solution behavior of a QP is much smoother and therefore
it is a preferred formulation for control.
Table 3.1 summarizes the optimization that results from various combinations of spatial and temporal norms.
3.8.5
Input Parameterization
In some algorithms like PFC, the input trajectory can be parameterized using
continuous basis functions like polynomials. This can be useful if the objective
is to follow smooth setpoint trajectories precisely, such as in mechanical servo
applications, and the sampling time cannot be made sufficiently small to allow
this with piecewise constant inputs.
In other commercial algorithms like HIECON and IDCOM-M, only a single
52
3.8.6
Model Conditioning
February 8, 2002
3.8.7
53
3.9
3.9.1
54
Figure 3.6: Open-Loop vs. Closed-Loop Update of the Model and Corresponding Prediction
For the step response model, we can modify the update equation to add
the step for feedback measurement based correction of the model state:
Model Prediction:
Y (k|k 1) = M Y (k 1|k 1) + Sv(k 1)
(3.62)
where Y (|k 1) denotes the estimate of Y () obtained at k 1, taking into
account all measurement information up to k 1. Recall that this is th esole
step we took to update the model state in the previous formulation. Here we
postulate to correct the model prediction Y (k|k 1) based on the difference
between the measurement ym (k) at time k and the model prediction y0 (k|k1),
the first output vector appearing in Y (k|k 1), for this time step.
Correction:
Y (k|k) = Y (k|k 1) + K(ym (k) y0 (k|k 1))
(3.63)
K=I =
(3.64)
... n
I
Element by element this equation is
(3.65)
55
February 8, 2002
INCLUDE!
Figure 3.7: Step Response of An Integrating System
where
"
N = |I 0 0{z. . . 0}
n
which allows one to compute the current state estimate Y (k|k) based on the
previous estimate Y (k 1|k 1), the previous input move v(k 1) and the
current measurement ym (k). I is referred to as the observer gain.
An advantage of the above formulation is that substantial theories exist
that enable us to design K optimally so as to account for information we may
have about the statistical characteristics of disturbances and measurement
noise. Another advantage is that it can be generalized to handle systems with
unstable dynamics. We note that running an unstable system model in open
loop would lead to an OVERFLOW in the computer. The noise filtering
issue will be discussed in a simplified way below. The more rigorous general
treatment of the design of K and its accompanying properties will be given in
Chapter ?? in the advanced part of the book.
3.9.2
Integrating System
(3.66)
(3.67)
Y (k + 1) = M I Y (k) + Sv(k)
(3.68)
Then, we can represent the state transition from one sample time to the next
as
56
where S =
S1 S n
0
0
.
.
MI =
.
0
0
as before and
1
0
0
I
... ...
0 ...
0
0
..
.
2I
(3.69)
where M I represents essentially the same shift operation as before except the
way the last set of elements are constructed. Note that the assumption of
pure linear rise after n steps in the step response translates into the transition
equation of
(3.70)
57
February 8, 2002
prediction equation from before to
y1 (k)
y2 (k)
..
Y(k + 1|k) =
.
..
.
yp (k)
| {z }
MI Y (k)
from the memory
d
S1
wy (k|k) + (wy (k|k) wy (k 1|k 1))
Sd
wy (k|k) + 2 (wy (k|k) wy (k 1|k 1))
2
..
..
.
+ . d(k) +
..
..
.
.
Spd
wy (k|k) + p (wy (k|k) wy (k 1|k 1))
|
{z
|
{z
}
feedback term
d
S d(k)
feedforward term
S1
0
0
u(k|k)
u
u
S
S1
0
0
2
u(k + 1|k)
..
.
..
.
.
.
.
.
.
.
.
..
.
.
.
+ u
u
u
S1
Sm Sm1
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
u(k + m 1|k)
u
u
Spm+1
Spu Sp1
|
{z
}
|
{z
}
U(k)
Su
future input moves
dynamic matrix
(3.71)
where wy (k|k) = ym (k) y0 (k) representing the model prediction error. Notice
that we have used two point linear extrapolation in projecting e into the future.
In the above,
0 I 0 ... ...
0
0
0 0 1
0
.
.
.
0
0
..
..
I
0
I
p
(3.72)
M = 0 0 ... ... ...
0
0
.
.
.
.
.
.
.
.
.
I
2I
0 0 . . . . . . . . . 2I 3I
.. ..
..
..
..
..
..
. .
.
.
.
.
.
Here we assumed p > n. If not, one can simply take the first p rows.
However, there is a problem in using the above in practice: The openloop model prediction of the output (terms like yi (k)) will increase without
58
(3.73)
(3.74)
Correction:
where
I0 =
and hence
I + I0
0
I
2I
..
.
(n 1)I
I
2I
3I
.
..
nI
(3.75)
(3.76)
3.9.3
Noise Filter
Let us take another look at the correction steps for both stable systems
Y (k|k) = Y (k|k 1) + I(ym (k) y0 (k|k 1))
(3.63)
(3.74)
59
February 8, 2002
Figure 3.8: Motivation for observer correction term ( = ym (k) y0 (k|k 1)).
In both cases, we determine the difference between the model prediction
y0 (k|k 1) and the measurement ym (k) and add it either as a constant bias or
a ramp bias to the model prediction Y (k|k 1) to obtain the corrected prediction Y (k|k). This is justifiable, for example, if the difference is solely due to
a constant disturbance effect. It could, however, be solely due to measurement
noise, in which case we would not want to correct the model prediction at all.
In general, disturbances, model error, and measurement noise will contribute
to the difference, in which case a more cautious correction than implied by
(3.63) and (3.74) will be appropriate. This can be achieved by filtering the
correction term in (3.63):
Y (k|k) = Y (k|k 1) + IF [ym (k) y0 (k|k 1)]
(3.77)
and
F = diag f1 , f2 , . . . , fny
(3.78)
0 < f` 1
(3.79)
Thus, rather than correct the model prediction by the full error [ym (k)
y0 (k|k 1)] one takes a more cautious approach and utilizes only a fraction
f` . The larger the measurement noise associated with output y` , the smaller
f` should be chosen.
To understand better the effect of this noise filter on control performance,
assume that the output suddenly changes to a constant value (output dis-
60
turbance) ym (0) = y and that neither the disturbance nor the manipulated
variables change (d(k) = 0, u(k) = 0). Then we find from (3.62)
Model Prediction:
Y (k|k 1) = M Y (k 1|k 1)
(3.80)
(3.81)
"
(3.82)
where
N = |I 0 {z
. . . 0}
n
The form suggests and we shall rigorously prove this in the advanced
part that Y (k|k) corresponds to ym (k) passed through a first-order filter.
Indeed, the estimate Y (k|k) approaches the true value y with the filter time
constant:
stable system : T / ln(1 f` )
(3.83)
where T is the sample time. In principle, for integrating systems, we could
also detune the observer gain I + I0 (3.76) by post-multiplying it by a filter
matrix F . However, this choice tends to lead to a highly oscillatory observer
response and is therefore undesirable. As we will show in the advanced
part, it is more desirable to introduce the filter in the following manner into
(3.74):
Y (k|k) = Y (k|k 1) + (I F + I0 F 0 )(ym (k) y0 (k|k 1))
where F was defined as above (3.78) and
o
n
F 0 = diag f10 , f20 , . . . , fn0 y
and
fi0 =
fi2
2 fi
(3.84)
(3.85)
(3.86)
Thus, for both stable systems (3.77) and integrating systems (3.84), we have a
single tuning parameter 0 < f` 1 for each output. The noise filtering action
decreases with increasing f` . For f` = 1, measurement noise is not filtered at
all and we recover (3.63) and (3.74).
61
February 8, 2002
In general, there will be both stable and integrating outputs and the filter
gain is chosen for each output either as suggested in (3.77) or in (3.84). This
leads to the following correction expression with the general filter gain K F :
Y (k|k) = Y k|k 1) + KF (ym (k) y0 (k|k 1))
3.9.4
(3.87)
Bi-Level Optimization
The MPC calculation can be split into two parts for an added flexibility. First a
local steady-state optimization can be performed to obtain obtain target values
for each input and output. This can be followed by a dynamic optimization to
determine the most desirable dynamic trajectory to these target values. Even
though the local steady-state optimization can be based on an economic index,
it does not replace the more comprehensive nonlinear optimization that often
runs above the MPC layer at a much more slower rate in order to provide an
optimal range of inputs and outputs for the plant condition experienced during
a particular optimization cycle. The local optimization performed in MPC is
based on a linear steady state model, which may be obtained by linearizing a
nonlinear model or simply the steady-state version of the step response model
used in the dynamic optimization.
The reason for running the local optimization may vary. For example,
one may want to perform an economic optimization at a higher frequency to
account for local disturbances. Even if there is no economic objective in the
given control problem, the steady-state optimization can be helpful to determine best feasible target values for CVs and the corresponding MV settling
values.
The two-stage optimization can be formulated as below:
Step 1: Steady-State Optimization
The general form of a steady-state prediction equation is
y(|k) = Ks (u(|k) u(k 1)) +b(k)
{z
}
|
(3.88)
us (k)
where y(|k) and u(|k) are the steady state values of the output and
input projected at time k. With only m input moves considered,
us (k) = u(k) + u(k + 1) + . . . . . . + u(k + m 1)
(3.89)
(3.90)
(3.91)
and Ks = Sn . Also,
62
(3.92)
us (k)
(3.93)
min kr y(|k))kQ
(3.94)
us (k)
us (k)
In the first case, we would be looking for a minimum input change such
that all the constraints are satisfied. In the second case, we would seeking
a minimum deviation from the setpoint values that are achievable within
the given constraints. The solution sets the target settling values for the
inputs and outputs.
Step 2: Dynamic Optimization
The dynamic prediction equation is same as before. A quadratic regulation objective of the following is minimized subject to the give constraints
through QP:
m+n2
X
i=1
m1
X
j=0
uT (k + j|k)Ru(k + j|k)
(3.95)
where
is the solution from the steady-state optimization. An
additional constraint may be added to match the settling values of the
optimized input trajectories to those computed from the steady-state
optimization:
y (|k))
63
February 8, 2002
3.9.5
Many CVs, such as product compositions and other property variables, cannot
be measured at a frequency, speed, accuracy, and/or reliability required for
direct feedback control to be effective. Control of these variables may be
nevertheless critical and the only recourse may be to develop and use estimates
from measurements of other process variables. Since all process variables are
driven by a same basic set of disturbances and inputs, their behavior should
be strongly correlated. The correlation can be captured, which can be used to
build an inferential estimator for the property variables.
Typically, linear regression techniques, such as least squares and partial
least squares (PLS), are used to build an estimator of the form
y1s (k 1 )
..
yp (k) = L
.
s
y` (k ` )
(3.97)
where yp is the estimate of the product property variable in question and yis s
are he secondary variables used to estimate the product property. Because
different variables can have different response times to various inputs, the
regressor may need to include delayed measurement terms as shown above.
Determination of appropriate delay amounts would require significant process
knowledge or a careful data analysis. In the case that proper values for these
delays cannot be determined a priori, one may have to include several delayed
terms of a same variable in the regressor.
When direct measurements of the product property variables are available,
one may use them in conjunction with the inferential estimates. In practice,
the direct measurements are typically used to correct any bias in the inferential
estimate. For example, when a measurement of y p becomes available after a
delay of d sample steps, it can be included in the estimator in the following
way:
p
yp (k) = yp (k) + (ym
(k d ) yp (k d ))
(3.98)
p
is the measured value of y p and yp (k) is the measurement-corrected
where ym
estimate.
In the case that the process is highly nonlinear and / or the operation range
is wide, nonlinear regression techniques such as Artificial Neural Networks can
be used in place of the least squares technique.
64
Naphtha
Crude Oil
Vaporizer
Light Gas Oil
Short Residue
Heavy oil
Fractionator
Figure 3.9: Heavy Oil Fractionator
3.10
3.10.1
This case study is intended to show how the MATLAB MPC Toolbox can be
used to design a model predictive controller for a fairly complex, practically
motivated control problem and test it through simulation. The case study is on
a heavy-oil fractionation column and involves multivariable control, constraint
handling, and optimization. The interested readers may attempt to solve the
problem using the standard step response based DMC technique we explained
in this chapter. A solution will be provided at the end of the section.
65
February 8, 2002
PC
reflux drum
(F )
1 s
LC
FC
Upper reflux duty
(URD)
F
1
(IRD)
(F )
2 s
FC
F
2
(Q)s
Bottoms reflux
duty (BRD)
FC
Q
LC
Feed
Bottoms
66
reflux drum
T
LC
FC
Upper reflux duty
Top draw
stripper
FC
T
Bottoms reflux
duty
FC
Side draw
LC
T
Bottoms
Feed
Models relating the MVs to the CVs and other outputs are given as below:
T EP y1
SEP y2
T T y3
U RT y4
SDT y5
IRT y6
BRT y6
TD
u1
SD
u2
BRD
u3
U RD
d1
IRD
d2
4.05e27s
50s+1
5.39e18s
50s+1
3.66e2s
9s+1
5.92e11s
12s+1
4.13e5s
8s+1
4.06e8s
13s+1
4.38e20s
33s+1
1.77e28s
60s+1
5.72e14s
60s+1
1.65e20s
30s+1
2.54e12s
27s+1
2.38e7s
19s+1
4.18e4s
33s+1
4.42e22s
44s+1
5.88e27s
50s+1
6.9e15s
40s+1
5.53e2s
40s+1
8.1e2s
20s+1
6.23e2s
10s+1
6.53e1s
9s+1
7.2
19s+1
1.2e27s
45s+1
1.52e15s
25s+1
1.16
11s+1
1.73
5s+1
1.31
2s+1
1.19
19s+1
1.14
27s+1
1.44e27s
40s+1
1.83e15s
20s+1
1.27
6s+1
1.79
19s+1
1.26
22s+1
1.17
24s+1
1.26
32s+1
The transfer functions for the actual plant are assumed to be the same as
the model, except that there can be gain variations. Structure of the variations
is shown below. They are to be incorporated into the simulation as a plantmodel mismatch.
67
February 8, 2002
T EP y1
SEP y2
T T y3
U RT y4
SDT y5
IRT y6
BRT y6
TD
u1
4.05 + 2.111
5.39 + 3.291
3.66 + 2.291
5.92 + 2.341
4.13 + 1.711
4.06 + 2.391
4.38 + 3.111
SD
u2
1.77 + 0.392
5.72 + 0.572
1.65 + 0.352
2.54 + 0.242
2.38 + 0.932
4.18 + 0.355
4.42 + 0.732
BRD
u3
5.88 + 0.593
6.90 + 0.893
5.53 + 0.673
8.10 + 0.323
6.23 + 0.303
6.53 + 0.723
7.2 + 1.333
U RD
d1
1.2 + 0.124
1.52 + 0.134
1.16 + 0.084
1.73 + 0.024
1.31 + 0.034
1.19 + 0.084
1.14 + 0.184
IRD
d2
1.44 + 0.165
1.83 + 0.135
1.27 + 0.085
1.79 + 0.045
1.26 + 0.025
1.17 + 0.015
1.26 + 0.185
Problem Statement
The following are the five cases representing different disturbances and gain
variations.
Case
Case
Case
Case
Case
I:
II:
III:
IV:
V:
d1
0.5
- 0.5
-0.5
0.5
-0.5
d2
0.5
-0.5
-0.5
-0.5
-0.5
1
0
-1
1
1
-1
2
0
-1
-1
1
1
3
0
-1
1
1
0
4
0
1
1
1
0
5
0
1
1
1
0
68
Manipulated Variables
u1, u2, u3
HEAVY OIL
FRACTIONATOR
Optimized input u3
Disturbances
Associate Variable y7
d1 and d2
Note that d1 , d2 are unmeasured disturbances. Therefore the model for the
MPC calculation should not contain the models d1 , d2 . These models appear
only in the plant used to do the closed-loop simulation.
y1
u
1
y2
u
u2
y7 = G
u3
u3
(3.99)
69
February 8, 2002
where,
Gu =
g71 g72 g73
0
0
1
(3.100)
y1
u1
y2
d
1
u
d
u2 + G
y7 = G
d2
u3
u3
(3.101)
We solve the control problem using the cmpc algorithm available in the
MPC toolbox5 . The cmpc algorithm solves, as the name suggests, Constrained
MPC problem. The description of cmpc could be found in the MPC toolbox
manual or by typing help cmpc at the MATLAB prompt. Resulting help text
displayed is:
CMPC Simulation of the constrained Model Predictive Controller. [y,u,ym]
= cmpc(plant, model, ywt, uwt,M,P, tend, r, ulim,ylim,... tfilter, dplant,
dmodel, dstep) REQUIRED INPUTS: . . . . . .
The first step in solving the problem is to obtain the step response matrices
plant and model. These two models are required to be in step response form.
We use the following MATLAB commands 6 to do this.
First, we obtain the models in transfer function form using poly2tfd function. You may refer to the MPC toolbox manual for more details. The usage of
this function to obtain the transfer function g11 is shown. The third argument
being 0 signifies continuous transfer function form and the final argument is
the time delay. g11 = poly2tfd(4.05, [50 1], 0, 27]; The resulting matrix
e27s
corresponds to the transfer function 4.05
50s+1 . We then use the tfd2step
function to get them in the required form. Before that, as described above,
we obtain all the necessary transfer functions (see Eq. 3.100). The transfer
functions are passed column-wise (and not row-wise). Scalar Ny specifies the
number of outputs (ie. number of rows).
delt=5; tfin=300; Ny=4;
model = tfd2step(tfin,delt,Ny,g11,g21,g71,gu31,g12,g22,g72,gu32,g13,g23,g73,gu33);
5
Please refer to the MPC toolbox manual for details. You may also use the GUI (Graphical
User Interface) version or SIMULINK, instead
6
The commands described may be MATLAB commands or MPC Toolbox commands. In
this example, we will use the same term in describing both. We assume here that you have
MPC Toolbox installed. If not, some of these commands will not work.
70
You may want to see the validity of the tfd2step command by actually
computing the step response matrix yourself. One way to do this is to find the
inverse Laplace transform of a transfer function and obtain the step response
matrix. Remember that, to obtain the step response, you need to multiply the
transfer function with 1/s (output transform for a step input) before taking
the inverse Laplace transform. First few elements of model are listed below:
model =
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.2113
..
.
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0945
0.0000
0.0000
0.0000
0.5443
..
.
0.0000
0.0000
1.6659
1.0000
0.0000
0.0000
2.9464
1.0000
0.0000
0.0000
3.9306
1.0000
0.0000
0.8108
..
.
February 8, 2002
71
[19 1]);
gu31=poly2tfd(0, [1]); gu32=poly2tfd(0, [1]); gu33=poly2tfd(1, [1]);
delt=5; tfin=300; Ny=4;
model = tfd2step(tfin,delt,Ny,g11,g21,g71,gu31,g12,g22,g72,gu32,g13,g23,g73,gu33);
E1=-1; E2=-1; E3=-1; E4=1; E5=1; d1=-0.5; d2=-0.5; y7min=-0.5;
c11=4.05+2.11*E1; c12=1.77+0.39*E2; c13=5.88+0.59*E3; c21=5.39+3.29*E1;
c22=5.72+0.57*E2; c23=6.90+0.89*E3; c71=4.38+3.11*E1; c72=4.42+0.73*E2;
c73=7.20+1.33*E3;
g11=poly2tfd(c11, [50 1], 0, 27); g12=poly2tfd(c12, [60 1], 0, 28); g13=poly2tfd(c13,
[50 1], 0, 27);
g21=poly2tfd(c21, [50 1], 0, 18); g22=poly2tfd(c22, [60 1], 0, 14); g23=poly2tfd(c23,
[40 1], 0, 15);
g71=poly2tfd(c71, [33 1], 0, 20); g72=poly2tfd(c72, [44 1], 0, 22); g73=poly2tfd(c73,
[19 1]);
plant = tfd2step(tfin,delt,Ny,g11,g21,g71,gu31,g12,g22,g72,gu32,g13,g23,g73,gu33);
dc11=1.20+0.12*E4; dc12=1.44+0.16*E5; dc21=1.52+0.13*E4; dc22=1.83+0.13*E5;
dc71=1.14+0.18*E4; dc72=1.26+0.18*E5;
d11=poly2tfd(dc11, [45 1], 0, 27); d12=poly2tfd(dc12, [40 1], 0, 27); d21=poly2tfd(dc21,
[25 1], 0, 15); d22=poly2tfd(dc22, [20 1], 0, 15); d71=poly2tfd(dc71, [27 1]);
d72=poly2tfd(dc72, [32 1]);
du1=poly2tfd(0, [1]); du2=poly2tfd(0, [1]);
dplant = tfd2step(tfin,delt,Ny,d11,d21,d71,du1,d12,d22,d72,du2);
dmodel=[];
ywt=[1 1 0 1]; uwt=[0.1 0.1 0.1]; M=10; P=20; tend=300; r=[0 0 y7min
0];
ulim=[-0.5 -0.5 -0.5 0.5 0.5 0.5 0.05 0.05 0.05];
ylim=[-0.5 -inf y7min -0.5 0.5 inf inf 0.5];
tfilter=[]; dstep=[d1 d2];
[yp,u] = cmpc(plant,model,ywt,uwt,M,P,tend,r,ulim,ylim,tfilter,dplant,dmodel,dstep);
figure (3); plotall (yp,u,delt); subplot(211); legend(y1,y2,y7,u3,0); subplot(212); legend(u1,u2,u3,0);
Modifying the above for the other cases is trivial. Just the values for
E1. . . E5, D1 and D2 need to be changed. The cases of u1 actuator being
stuck and no feedback for output y1 are left as an exercise. Helpful hints:
For a stuck actuator, the respective input should be constrained to a
very small region of operation.
72
Outputs
0.2
0
0.2
0.4
0.6
y1
y2
y7
u3
0.8
1
1.2
50
100
150
Time
200
250
Manipulated Variables
0.6
0.5
0.4
u1
u2
u3
0.3
0.2
0.1
0
0.1
50
100
150
Time
200
250
February 8, 2002
If feedback filtering time delay is , we get no feedback.
73
MODEL PREDICTIVE
CONTROL
Manfred Morari
Jay H. Lee
Carlos E. Garca
Chapter 8
Modeling
8.1
8.1.1
State-Space Model
Continuous Time
(8.1)
x denotes the state vector, v denotes the input vector containing the independent variables, and y denotes the output vector comprising linear combinations
of the state variables. The state can be composed of physical variables if the
model is derived from first principles, or of some artificial variables storing the
information about the past inputs needed to predict the future behavior of the
outputs.
Solution to the above differential equation with initial condition x(0) = x0
is
x(t) = e
Ac t
x0 +
eA
c (t )
Bv( )d
(8.2)
The matrix exponential in the above is defined with the infinite power series
e
Ac t
X
(Ac t)n
n=0
n!
(8.3)
and can be evaluated conveniently using the Cayley Hamilton Theorem discussed in Appendix ??.
8.1.2
Discrete Time
(8.4)
The typical assumption underlying this representation is that the system input v(t) changes only at discrete equally spaced times (. . . , tk , tk+1 , . . .) (see
Figure 8.1). The model describes the behavior of the state only at the discrete sample times. hs = tk+1 tk is the sampling time. The discrete-time
description can be obtained from the continuous state-space model and the
relationship between the continuous system matrices and the discrete system
matrices will be derived shortly. It can also come directly from system identification, which is the subject of Chapter ??.
(8.5)
where x(k) represents the value of x at the kth sample time, and so on.
Notice that, with the above model, one gets a delay of at least one sample
interval between v and y. This reflects the fact that the output does not
respond instantaneously to an input change.
Representation of time delays within the discrete-time setting is quite
straightforward. For instance, consider the case where the input has a delay of d sampling units (where d 1). The system description is
x(k + 1) = Ax(k) + Bv(k d)
y(k) = Cx(k)
which can be converted into the
state in the following way:
A B
x(k + 1)
0 0
v(k d + 1)
.. . .
..
.
=
.
.
.
.
v(k 1)
..
..
v(k)
0
y(k) = Cx(k)
(8.6)
0
x(k)
0
v(k
d)
..
0
I v(k 2)
v(k 1)
0
0
0
..
.
+
0
I
v(k)
(8.7)
Different units of delays in the individual input channels can be handled
efficiently in the same straightforward manner. Delays in the output channels
can also be handled in a similar manner.
Example 8.1 Consider the system
Modeling
(8.8)
The above system has a delay of three sampling times (3hs ), including the one
sampley delay inherent to every system, between the input v and the output y.
The corresponding state-space description is
x1 (k + 1)
0.5 1
x2 (k + 1) = 0 0
x3 (k + 1)
0 0
y(k) = x1 (k)
8.1.3
0
x1 (k)
0
1 x2 (k) + 0 v(k)
x3 (k)
0
1
(8.9)
Recall that the solution to the differential equation (8.1) with initial condition
x(0) = x0 is
Z t
c
Ac t
x(t) = e x0 +
eA (t ) B c v( )d
(8.10)
0
With the change of variable = tk+1 , we obtain the corresponding discretetime system
x(k + 1) = Ax(k) + Bv(k)
(8.12)
y(k) = Cx(k),
where
A = eA hs
Rh
c
B = 0 s eA d B c
(8.13)
(8.13) gives formulas for converting continuous-time system matrices into discretetime system matrices.
When the continuous time system has a delay, the conversion procedure
becomes more complicated. As an example, let us consider the continuoustime state-space model of (8.1) that has an input delay of . Suppose for a
moment that 0 < hs . Then, the input v(t ) takes the value
v(k 1)
for tk t < tk +
v(t ) =
(8.14)
v(k)
for tk + t < tk+1
during the time interval between tk and tk+1 . Then, integration of (8.1) from
tk to tk+1 yields
x(k + 1) = Ax(k) + B1 v(k 1) + B0 v(k)
(8.15)
R c
c
B1 = eA (hs ) 0 eA d B c
R hs Ac
e
d B c
B0 = 0
(8.16)
The above can be put in the standard form by augmenting the state x(k) with
v(k 1):
B0
x(k)
A B1
x(k + 1)
v(k)
(8.17)
+
=
I
v(k 1)
0 0
v(k)
Note that the state vector now includes v(k 1) since its effect is not fully
stored in x(k) due to the delay.
0
0
v(k d)
for tk t < tk +
v(t ) =
(8.18)
0
v(k d + 1)
for tk + t < tk+1
Then integration of (8.1) from tk to tk+1 yields
x(k + 1) = Ax(k) + B1 v(k d) + B0 v(k d + 1)
where B1 and B0 are defined same
standard state-space representation
A B 1 B0
x(k + 1)
0 0
I
v(k d + 1)
.
.
.
.
..
.. ..
= .
.
. .
v(k 1) ..
.. ...
v(k)
0
(8.19)
0
0 0
0
x(k)
0 0
0
v(k
d)
.. ..
..
..
. 0
.
+ . v(k)
.. ..
.
. I v(k 2) 0
I
v(k 1)
0
(8.20)
For systems with different input and output delays, the procedure can become very complex. One immediate way is to discretize the model for each
input-output pair separately and then pack them together into a single
model afterward (as described in Exercise). The resulting model is likely
to include superfluous states as some of the states for different input-output
pairs may be merged and shared. Later in this chapter, we will describe a
procedure called model reduction, which can be used to get rid of redundant
or negligible states.
8.1.4
State-Coordinate Transformation
Modeling
(8.21)
Rearranging the above gives the state-space system model for the new coordinate system
1
x
(k + 1) = T
(k) + |{z}
T B v(k)
| AT
{z } x
A
B
(8.22)
1
x
(k)
y(k) = CT
| {z }
8.1.5
(8.23)
8.1.6
Stability
chosen as the eigenvectors of A. Let us assume for convenience that A has a full
set of eigenvectors e1 , , en . Then, performing the coordinate transformation
x
=
e1 e n
x
(k + 1) = e1
1
..
=
.
n
en
e1 e n
(8.24)
(8.25)
Hence, in the transformed coordinates, the behavior of each state is independent of one another and can be analyzed on a separate basis. From the
equations x
i (k + 1) = i x
i (k), i = 1, , n, it is clear that all i s must have
magnitudes less than zero for asymptotic stability.
8.1.7
Lyapunov Equation
= xT (k) AT P A P x(k)
(8.26)
(8.27)
The above form of equation is referred to as a Lyapunov equation for discretetime systems. It is true (and can be proven) that a positive-definite solution
(P > 0) to the Lyapunov equation always exists for any given Q > 0 if all the
eigenvalues of A are strictly inside the unit disk. It is given as a homework
exercise to prove this.
Modeling
x1 (k + 1)
0.2 0.5
x1 (k)
1
=
+
v(k)
x2 (k + 1)
0.0 1.0
x
(k)
0
2
(8.28)
x1 (k)
1 1
y(k) =
x2 (k)
Note that the system is BIBO stable since the integrating state x2 is not affected
by input v(k) and hence does not ramp up or down under a constant input.
However, the system is not asymptotically stable since x2 remains constant
and does not return to zero.
8.1.8
B AB An1 B
(8.29)
(8.30)
This condition can be easily proved using the Cayley Hamilton Theorem
in Appendix ??. First, note that
v(k 1)
v(k 2)
x(k) = Ak x(0) + B AB Ak1 B
(8.31)
..
.
v(0)
Reachability requires that, for all choices of x(0) <n and xt <n ,
there be an input
sequence (v(0), ,v(k 1)) such that x(k) = xt .
A I B
=n
A
(8.32)
rank
where A denotes the spectrum (the set of all eigenvalues) of A. To see
how the above condition arises, let us first prove that
Reachability
rank A I B = n A
For this, we can prove instead
Not Reachable
xT B AB An1 B = xT B xT B n1 xT B = 0T
10
Modeling
This implies that the controllability matrix does not have a full row rank
and therefore system is not reachable.
We must also prove that
Reachability
rank
Reachable
rank
A I B
A I B
= n A
6= n
for some A
xT A I B = 0T for some A
which means the rank is not n.
x
1
x, that
There exists a coordinate transformation,
= T1c T2c
x
2
transforms the state-space model into the form of
1
x
1 (k + 1)
A11 A12
x
1 (k)
B
v(k),
(8.33)
=
+
x
2 (k + 1)
x
2 (k)
0
0 A22
1 ) is a reachable pair. This form is intuitive and convenient as
where (A11 , B
x
1 represents the part of the state that is reachable and x
2 represents the part
that is not (and cannot be affected by the input at all).
To put the model in the above form, we must choose T1c and T2c such that
the columns of T1c span the range space of Wnc and T2c its orthogonal complement. One can find such matrices, for instance, by performing a singular value
decomposition on the controllability matrix,
1 0
V1
Wnc = U1 U2
(8.34)
0 0
V2T
and choosing
T1c = U1 ,
T2c = U2
(8.35)
If all the eigenvalues of A22 are zero, the system is controllable. If the
eigenvalues of A22 in (8.33) lie strictly inside the unit disk, the system is
stabilizable. This means that the states that are not reachable evolve through
stable dynamics.
11
A I B
+
A
=n
(8.36)
where +
A is the the set of all eigenvalues of A lying on or outside the unit
disk. This can be proved by following essentially the same argument as the
one we used for proving the analogous test for reachability. The proof is left
as a homework exercise.
It should be clear to the readers by now that the reachability and stabilizability are a systems intrinsic properties and lack of them can be overcome
only by changing the system dynamics or by choosing a different set of inputs.
Example 8.3 We will use the following simple system to illustrate the concept
of reachability and reachable subspace:
x1 (k + 1)
x2 (k + 1)
x1 (k)
x2 (k)
1
1
v(k)
(8.37)
B AB
1
1
2
2
The rank is 1 and the system is not reachable. SVD of W2c looks like
W2c
"
1
2
1
2
1
2
12
1 0
0 0
"
1
5
2
5
2
5
15
From this, we learn that the range space of W2c defined by the span of the first
output singular vector [ 12 12 ]T is the reachable subspace. The subspace is
graphically illustrated in Figure 8.2. Performing the coordinate transformation
#
" 1
1
x
1 (k)
x1 (k)
2
2
= 1
x
2 (k)
12
x2 (k)
2
we obtain
x
1 (k + 1)
x
2 (k + 1)
2 1
0 a
x
1 (k)
x
2 (k)
1
0
v(k)
Hence, x
1 = (x1+ x2 )/ 2 is the portion of the state that can be controlled and
x
2 = (x1 +x2 )/ 2 evolves according to the autonomous dynamics regardless of
the input. The stabilizability of the system depends on a. If a 1, the system
is not stabilizable, as shown by the possible control responses for various values
of a in Figure 8.2.
12
Modeling
Figure 8.2: Reachable Subspace and Possible Control Responses for Various
Values of a in Example 8.3
Example 8.4 Let us use the following state-space system to further illusrate
the concepts:
0.65
0.1
0
0.45
0.35
0.4
0.3
0.15
x(k) +
x(k + 1) =
0
0.25
0.35
0.2
0.4
0.2 0.05 0.15
2 0 0 2 x(k)
y(k) =
0.5
0.5
u(k)
0
0
(8.38)
0.5 0.375
0.3375
0.3263
0.5 0.375
0.3375
0.3263
W c = B AB A2 B A3 B =
0
0.125
0.1625
0.1738
0 0.125 0.1625 0.1738
The rank of the above matrix is 2 and therefore the system is not reachable.
The eigenvalues of
A are 1, 0, 0.5 and 0.3. Evaluating the rank of the
matrix A I B , we find that the rank is 4, 3, 3, 4 for = 1, 0, 0.5, 0.3.
Hence, we reach the same conclusion that the system is not reachable. Moreover, the two eigenvalues that resulted in loss of rank ( = 0, 0.5) are both
stable. So, the system is stabilizable.
13
u1 u2 u3 u4
0.6823
0.6823
=
0.1857
0.1857
0.5965
0.4880
0.4554
0.4457
0.1857
0.1857
0.6823
0.6823
0.7699
0.1296
0.3995
0.4804
T
v1
1 0 0 0
v2T
0 2 0 0
0 0 3 0 v3T
0 0 0 4 v4T
0.7071
0
1.1438
0
0 0
0.7071
0
0
0.2412 0 0
0
0
0 0
0
0.7071
0
0 0
0
0.7071 0
0.1184 0.1932
0.5943 0.6260
0.7476
0.2721
0.2716 0.7048
Note that the span of u1 and u2 represent the reachable subspace. Also, the
choices of u3 and u4 are not unique (since they both correspond to the zero
singular values) and any two orthonormal vectors in the span of the above
given u3 and u4 will work. For example, in the SVD, we could have used
instead
0.5
0.5
0.5
, u4 = 0.5
u3 =
0.5
0.5
0.5
0.5
since the span of the above two vectors are the same as that of the previous
choices of u3 and u4 . The same can be said for the choices of v3 and v4 .
A = U 1 AU =
0
0
0.3
0.3
0
0
0.2
0.2
0.6823
= U 1 B = 0.1857
B
0
0
C =
One can clearly see that the first two states in the transformed coordinates are
reachable
and the
14
Modeling
8.1.9
CT
(CA)T
(8.39)
(CAn1 )T
(8.40)
(8.41)
AT I C T
=n
(8.42)
15
As before,
we can find a coordinate transformation that puts (8.5) in the
x
1
x that transforms (8.5) into the form of
= T1o T2o
form of
x
2
1
x
1 (k + 1)
A11 0
x
1 (k)
B
=
+
v(k)
x
2 (k + 1)
x
2 (k)
A21 A22
B2
(8.43)
x
1 (k)
C1 0
y(k) =
x
2 (k)
where (C1 , A11 ) is an observable pair. In the above form, x
2 represents the
part of the state that is not observable from y. In other words, the value of x
2
does not have any effect on the output and hence cannot be observed from it.
To put the system in the above form, one must choose T2o so that the
columns span the null space of Wno and T1o as its orthogonal complement.
Again, SVD of Wno can be used conveniently to obtain such a transformation
as we will illustrate below.
System (8.43) (and hence original system (8.5)) is detectable if all the
eigenvalues of A22 lie inside the unit disk. A more explcit test for detectability
that can be derived from the duality argument is
T
A I C T
=n
+
(8.44)
rank
A
Example 8.5 Let us use the dual system of (8.37), which we used in Example 8.3) to elucidate the concept of observability and unobservable subspace:
(8.45)
x1 (k)
1 1
y(k) =
x2 (k)
The observability matrix is
W2o
CT
AT C T
1
1
2
2
The rank is 1 and the system is not reachable. SVD of W2o looks like
" 1
#
#
" 1
2
1
0
1
5
2
2
W2o = 25
1
1
1
0
0
5
5
2
2
The span of the second input vector [ 12 12 ]T is the null space of W2o , which
is the unobservable subspace of the system. This is illustrated in Figure 8.3.
Performing the coordinate transformation
#
" 1
1
x1 (k)
x
1 (k)
2
2
= 1
12
x2 (k)
x
2 (k)
2
16
Modeling
x
1 (k)
2 0
=
x
2 (k)
1 a
1
x
1 (k)
y(k) =
0
x
2 (k)
x
1 (k + 1)
x
2 (k + 1)
x
1 is the portion of the state that can be observed and x
2 the portion that
cannot be seen from the output at all as its evolution has no effect on y. The
detectability of the system depends on a. If a 1, the system is not detectable.
Example 8.6 . Let us bring back the system (8.38), which we used to demonstrate the use of the reachability and stabilizability tests earlier in Example 8.4.
This time, we will analyze the system for observability and detectability. First,
we calculate the observability matrix:
Wo =
CT
AT C T
(AT )2 C T
(AT )3 C T
T
2
1.7
1.55
1.475
T 0
0.3
0.45
0.525
=
0
0.3
0.45
0.525
2 1.7 1.55 1.475
The rank of the above matrix is 2 and therefore the system is not observable.
17
v1
1 0 0 0
0 2 0 0 v2T
u1 u2 u3 u4
(W o )T =
0 0 3 0 v3T
T
0 0 0 4
4
v
0.6964 0.1227 0.7018 0.0867
0.1227
0.6964
0.0867
0.7018
=
0.1227
4.8615
0
0 0
0
0.6618 0 0
0
0
0 0
0
0
0 0
Note that the span of u3 and u4 , which is the null space of W o , represents
the unobservable subspace. As before, the choices of u3 and u4 are not unique
since they both correspond to the zero singular values, and any two orthonormal
vectors in the span of the above-given u3 and u4 will work. For example, in
the SVD, we could have used instead
0.5
0.5
0.5
, u4 = 0.5
u3 =
0.5
0.5
0.5
0.5
since the span of the above two vectors are the same as that of the former
choices of u3 and u4 . The same can be said for the choices of v3 and v4 .
0.9294 0.1008
0
0
0.3008 0.5706
0
0
A = U 1 AU =
0.2549 0.0891 0.2350 0.0833
0.1261 0.0344 0.1833 0.065
0.4095
= U 1 B = 0.2868
B
0.3942
0.3075
C = 2.7855 0.4908 0 0
We can clearly see that, with this coordinate transformation, the first two states
are observable
and the next
18
Modeling
8.1.10
Kalmans Decomposition
Kalman (1961) showed that one can always find a coordinate transformation
yielding the following decomposition:
A11
x
1 (k + 1)
x
(k
+
1)
2
= 0
x
A31
3 (k + 1)
x
4 (k + 1)
0
y(k) =
C1
A12 0
0
x
1 (k)
x
A22 0
0
2 (k)
3 (k)
A32 A33 A34 x
x
4 (k)
A42 0 A44
x
1 (k)
x
2 (k)
C2 0 0
x
3 (k)
x
4 (k)
B1
0
+
B
3
0
v(k)
(8.46)
The decomposition clearly indicates the parts of the state that are reachable
and / or unobservable, i.e., x
1 through x
4 represent the parts of the state that
are both reachable and observable, observable but not reachable, reachable
but not observable, and neither reachable nor observable, respectively.
The Kalmans decomposition can be obtained by performing a coordinate
1
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
R = Span
,
;
S = Span
,
0.5 0.5
0.5 0.5
0.5
0.5
0.5
0.5
0.5
0.5
, R S = Span
R S = Span
0.5
0.5
0.5
0.5
0.5 0.5
,
0.5 0.5
0.5
0.5
0.5
0.5
,
0.5
0.5
the span of ([0.5, 0.5, 0.5, 0.5]) is both not reachable and not observable;
19
the span of ([0.5, 0.5, 0.5, 0.5]) is reachable (since its orthogonal to R )
but not observable;
the span of ([0.5, 0.5, 0.5, 0.5]) is observable (since its orthogonal to
S) but not reachable;
and the span of ([0.5, 0.5, 0.5, 0.5]), the orthogonal complement to R
S, is both reachable and observable.
Hence, lets do the coordinate transformation
1
0.5
0.5
0.5
0.5
0.5 0.5 0.5 0.5
z=
0.5 0.5 0.5 0.5 x
0.5 0.5 0.5
0.5
Then, we obtain
0
A = U 1 AU =
0.2
0
0.2 0
0
0.5 0
0
0.5
= U 1 B = 0
B
0.5
0
C = 2 2 0 0
8.1.11
Minimial Realization
We say that a state-space system is minimal if it is both reachable and observable. A minimal realization of a system is a minimal system that is equivalent
to the original system in the input-output system but is devoid of the unreachable and/or unobservable parts. These terminolgies convey the fact that, for
the purpose of describing input-output system dynamics, unreachable and/or
unobservable parts are superfluous and can be elminated without loss of information. Based on these definitions, a minimial realization of (8.46) which
results from Kalmans decomposition is
x
1 (k + 1) = A11 x
1 (k) + Bv(k)
y(k) = C1 x
1 (k)
(8.47)
20
Modeling
8.1.12
(8.48)
Model Reduction
A model of lower order is generally preferred as it implies reduced complexity and computational requirement for tcontrol system design and implementation. On the other hand, it is often the case that a particular modeling
method we use yields a high-order model that may not even be minimal. A
good example is the step response identification that we discussed earlier. For
systems with many inputs and outputs, one can easily end up with a model
with hundreds or even thousands of state variables. The same can be said
about first-principles modeling. This motivates us to develop a model reduction technique that can transform a high-order model into a low-order model
that still retains the essential system information for controller design.
Beyond removing superfluous states and constructing a minimal realization, model reduction in general involves trading off some accuracy in describing the systems input output behavior for a reduced system order. Typically,
this is carried out by sorting the states according to their importance through
some appropriate coordinate transformation and removing those judged negligible. Formally stated, for the nth order state-space system of (8.5), the
objective is to obtain an rth order model (r < n) that closely approximates
the input-output relationship of the original system.
Let us introduce the state coordinate transformation x
(k) = T x(k) where
T is a nonsingular matrix. The state-space system in the transformed state
coordinates can be written as
x(k) + Bv(k)
x
(k + 1) = A
y(k) = C x
(k)
(8.49)
y(k) = Cr x
r (k),
where
(8.50)
21
x
r (k) =
Ar =
Ir 0
Ir 0
x
(k)
Ir 0 B
Ir
= C
0
r =
B
Cr
Ir
0
(8.51)
8.2
Transfer Functions
8.2.1
Continuous Time
y(s)
m sm + m1 sm1 + + 1 s + 0
,
=
v(s)
sn +
n1 sn1 + +
1s +
0
m < n (8.52)
In the above, y(s) and v(s) denote the Laplace transforms of the output and the
input, respectively. This corresponds to an input-output relationship described
by the following time-domain differential equation:
dn y
dn1 y
dy
dm v
dm1 v
dv
+
n1
+ +
1 +
0 y = m
+m1
+ +1 +0 u
dt
dt
dt
dt
dt
dt
(8.53)
Taking the Laplace transform of (8.53) with the assumption that the system
is initially at rest yields the transfer function of (8.52).
n is the order of the transfer function. All physical systems are strictly
proper, that is, m < n, as outputs do not respond instantaneously to inputs.
Time delays can be incorporated conveniently into transfer functions. For
example, the same input output relationship but with an additional delay of
22
Modeling
is represented by
y(s)
m sm + m1 sm1 + + 1 s + 0 s
=
e
v(s)
sn +
n1 sn1 + +
1s +
0
8.2.2
(8.54)
Discrete Time
bnm q m + bnm+1 q m1 + + bn
v(k)
q n + a1 q n1 + + an
(8.56)
with q denoting the forward-shift operation (i.e., q ` {y(k)} = y(k + `)). Performing z-transform on the difference equation yields
G(z) =
y(z)
v(z)
G(z) is the discrete-time version of Laplace transfer function and is ofen referred to as pulse transfer function in the literature since it describes the output
response behavior to a series of pulse changes to the input. n m > 0 is the
relative degree. The roots of D(z) = 0 are the poles and the roots of N (z) = 0
the zeros of the transfer fuction. Poles and zeros are intimately linked with the
dynamic characteristics of the input-output system, as we will discuss shortly.
Dividing the top and bottom by z n gives
bnm z (nm) + bnm+1 z (nm+1) + + bn1 z (n1) + bn z n
1 + a1 z 1 + + an1 z (n1) + an z n
(8.58)
The corresponding time-domain expression is
G(z) =
(8.59)
23
So the model can be written using the delay operators (q 1 ) instead of the
forward-shift operators. The above form of the equation provides a useful
insight about the nature of the model: The current output is a linear combination of the n past inputs and outputs. The relative degree, (n m),
represents the systems time delay, measured in number of sample intervals.
Comments:
For convenience, we will often replace y(z) = G(z)v(z) with a time domain expression y(k) = G(q)v(k), which denotes the difference equation
(8.56) corresponding to the transfer function (8.57).
In the special case that ai = 0 for all i, we obtain
y(k) = bnm u(kn+m)+bnm+1 u(kn+m1)+ +bn1 u(kn+1)+bn u(kn)
(8.60)
With n chosen to correspond to the systems settling time, the above
is a Finite Impulse Response (FIR) model that we used throughout the
basic part of the book.
8.2.3
Transfer Matrix
G2,nv (z)
(8.61)
G(z) =
..
..
..
..
.
.
.
.
Gny ,1 (z)
Gi,j denotes the transfer function between the jth input and the ith output.
The above is referred to as the transfer matrix.
8.2.4
1 ehs s
1
Gc (s)
G(z) = Z L
(8.62)
s
where L1 and Z denote the inverse Laplace transformation and the z-transformation,
respectively. Note that the discretization requires multiplication of the continhs s
uous transfer function with 1es , which corresponds to a zero-order hold,
24
Modeling
hs s
is the Laplace transbefore taking the z- transform. Note that Gc (s) 1es
form of the output response to a discrete unit impulse a unit-size pulse that
starts at t = 0 and lasts for one sample interval. This is consistent with the
fact that z-transform of a discrete unit impulse is 1.
Conversion tables for common types of transfer functions are available in
many standard textbooks but they cannot be used directly when the system
includes fractional delays (non-integer-multiples of the sample time).
8.2.5
A pulse transfer matrix can be obtained from state-space model (8.5) by taking
the z-transform with x(0) = 0:
zx(z) = Ax(z) + Bv(z)
y(z) = Cx(z)
(8.63)
(8.64)
A state coordinate transformation does not change the transfer matrix since
(8.65)
1
G(z) = C1 (zI A11 )1 B
(8.66)
Deriving an expression for the transfer matrix of a system given in the form
of (8.46), which is obtained after the Kalmans decomposition, provides some
insight. Straightforward calculation of (8.64) yields
A noteworthy point is that only those matrices for the reachable and observable
part of the state appear in the expression. The parts of the state-space system
that are not reachable and/or unobservable do not affect the transfer function
representation.
Example 8.9 Consider the integrating system
d2 x
dt2
y(s)
u(s)
1
.
s2
The corresponding
x1
0
+
u
x2
1
x1
1 0
y =
x2
=
0 1
0 0
25
B=
hs
e
0
A c hs
Ac s
=I+
ds B =
0 1
0 0
hs h2s /2
0
hs
hs =
1 hs
0 1
"
0
1
h2s
2
hs
2
hs /2
1 hs
u(k)
x(k) +
hs
0 1
1 0 x(k)
y(k) =
x(k + 1) =
Performing the z-transform on the state-space equation, we see that the corresponding transfer function is
G(z) = C(zI A)
B=
1 0
z 1 hs
0
z1
h2s /2
hs
h2s (z + 1)
2(z 1)2
1 1 ehs s
s2
s
=L
1
ehs s
s3
s3
t2
(t hs )2
S(t)
S(t hs )
2
2
where S(t) and S(t hs ) are unit step functions starting at time 0 and hs ,
respectively. First,
h2 (z + z 2 )
Z t2 = s
(z 1)3
h2 (1 + z)
Z (t hs )2 S(t hs ) = s
(z 1)3
t2
(t hs )2
S(t)
S(t hs )
2
2
h2s (z + z 2 ) h2s (1 + z)
h2s (z + 1)
=
2(z 1)3
2(z 1)2
26
Modeling
8.2.6
(8.67)
and
a1 a2 an
1
0
0
1
0 x(k) +
x(k + 1) =
..
..
.
.
0
1 0
0
b1 b2 bn x(k)
y(k) =
.. x(k) +
x(k + 1) =
an1 0 1
an 0 0 0
1 0 0 x(k)
y(k) =
a1
a2
..
.
1
0
0
1
1
0
0
..
.
0
b1
b2
..
.
bn1
bn
v(k)
(8.69)
v(k)
(8.70)
(8.69) and (8.70) are called the controllable canonical form and the observable
canonical form, respectively.
Realization of a transfer matrix can be a bit more tricky. If the transfer
matrix is given in a matrix polynomial form, it can be readily realized in one
of the canonical forms as before. For instance, suppose the transfer matrix is
given in the form of
(I + A1 z 1 + + An z n )y(z) = (B1 z 1 + + Bn z n )v(z) (8.71)
27
B1
A1 I 0
B2
A2
0 I 0
..
.
.
..
.. x(k) +
x(k + 1) =
. v(k)
.
An1 0 I
..
An 0 0 0
Bn
I 0 I x(k)
y(k) =
(8.72)
8.2.7
28
Modeling
SHOW FIGURE for constant damping line - match it with timedomain step response.
Figure 8.4: Pole Locations and the Step Response: Constant Damping Curves
8.2.8
The relative degree of a transfer function represents the delay between the
input and the output in number of sample time units. Transfer functions
derived from physical systems have relative degree of at least one; this means
there is at least one unit delay for all physical systems, for which outputs do
not respond instantaneously to an input change.
Finally, the existence of a zero outside the unit disk implies an inverse step
response.
Example 8.10 Show step responses of various transfer function.
29
For systems initially at rest, when a step input change of v() induces a final
output change of y(), K = y()/v() is called the gain of the system. For
stable systems, the gain can be computed easily using the transfer function
according to
K = lim G(z)
z1
(8.73)
This follows directly from the final value theorem, which states
lim y(t) = lim z y(z)
z1
(8.74)
Similarly, the systems frequency response, the amplitude ratio A.R. and
phase angle , can be computed readily from the transfer function:
A.R. = G(ej )
= tan1
Im[G(ej )]
Re[G(ej )]
(8.75)
(8.76)
8.3
8.3.1
For the discrete state-space description {A, B, C}, let us define B = [b1 , b2 , . . . , bnv ]
iT
h
and C = cT1 , cT2 , . . . , cTny , i.e., bi and ci are the ith column and row of B
and C. Let us also define the following discrete unit impulse starting at the
jth sample time:
j (t) =
Starting from an initial state of zero, the response of the system to an impulse
in the ith input channel at time t = 0
vi (t) = 0 (t)
vm (t) = 0 m 6= i
is
30
Modeling
i (0)
yimp
i (1)
yimp
i (2)
yimp
i (3)
yimp
..
.
=
=
=
=
0
Cbi
CAbi
CA2 bi
i (n) = CAn1 b
yimp
i
Hk =
h1,1,k
h2,1,k
..
.
h1,2,k
h2,2,k
..
.
...
...
hny ,1,k
hny ,2,k
. . . hny ,nv ,k
c1 Ak1 b1
k1 b
1
c2 A
=
..
cny Ak1 b1
h1,nv ,k
h2,nv1 ,k
..
.
c1 Ak1 b2
c2 Ak1 b2
..
.
...
...
c1 Ak1 bnv
c2 Ak1 bnv
= CAk1 B
..
cny Ak1 b2
Sk =
k
X
Hi
(8.77)
i=1
8.3.2
31
P
j v1 (j)j (t)
P
j v2 (j)j (t)
.
.
v(t) =
..
.
P
j vnv (j)j (t)
By superposition (i.e., by adding the responses of all individual impulses occurring in different input channels at different times), the response at time k
to this sequence is
y(k) =
nv
k1 X
X
[Hkj ]i vi (j) =
j= i=1
k1
X
Hkj v(j) =
j=
X
`=1
H` v(k `)
(8.78)
n
X
i=1
Hi v(k i)
(8.80)
n should be chosen sufficiently large so that the discarded terms are negligible.
The z-transform of the FIR model can be written as
y(z)
1
1
1
= H1 + H2 2 + + H n n
v(z)
z
z
z
(8.81)
Hence, the transfer matrix for an FIR model has all its poles at the origin
representing the delay operations.
32
Modeling
8.3.3
0
I
0
.
.
.
..
.
0
0
A =
0
0
I
0
0
0
... ...
0 ...
0 ...
0
0
0
..
.
.
.
0 ... ... ... 0
0 ... ... ... I
T
I 0 0
B =
H1 H2 H n
C =
..
..
..
.
..
n nv
(8.83)
v T (k 1) v T (k 2) v T (k n)
(8.84)
(8.85)
where
..
A =
.
. . . . . . . . . 0 I
I 0 0
C =
0 I
0 0
.
..
0 0
0 0
0
I
... ... 0
0 ... 0
n ny
(8.86)
We note that the impulse response or step response based state-space realization is a highly over-parameterized, nonminimal realization. In general,
one should be able to reduce the system order substantially by applying the
model reduction. Balanced truncation is a popular model reduction method
and gives some numerical advantages when applied to the above realization.
33
1)
+
n
i=1 Hi v(k + n i 1)
Pn
y(k + n + `) =
i=1 Hi v(k + n + ` i), ` 0
Now let us assume that the input is kept constant after time k1:
v(k) = v(k + 1) = . . . . Then, the output is
v(k1) =
P
y(k) = Pni=1 Hi v(k i)
y(k + 1) = Pni=2 Hi v(k + 1 i) + H1 v(k 1)
n
y(k + 2) =
i=3 Hi v(k + 2 i) + (H1 + H2 )v(k 1)
..
.
P
y(k + n 1) = Hn v(k 1) + n1
i=1 Hi v(k 1)
y(k + n + `) = y(k + n 1), ` 0
Now define the vector
Y (k) =
Y (k), the future output sequence assuming no more change occurs in the input,
can be interpreted as one possible choice for the state vector, as it represents
the effect of all past input moves on the future outputs.
Based on the same definition, the state vector at time k + 1 is
Y (k + 1) =
for v(k + `) = v(k), ` > 0
(8.87)
With the assumption that the system input v is kept constant after time k
(v(k) = v(k + 1) = v(k + 2) = . . .), the system output is
P
y(k + 1) = Pni=2 Hi v(k + 1 i) + H1 v(k)
n
y(k + 2) =
i=3 Hi v(k + 2 i) + (H1 + H2 )v(k)
..
.
Pn1
y(k + n 1) = H
v(k
1)
+
n
i=1 Hi v(k)
Pn
y(k + n) =
i=1 Hi v(k)
y(k + n + `) = y(k + n), ` 0
34
Modeling
(8.88)
where
A =
0
0
.
..
0
0
I
0
0
I
... ... 0
0 ... 0
0
0
I 0 0
C =
..
.
(8.89)
This is indeed the same step response based model (????) that we adopted
in the earlier discussion of industrial MPC. Unlike in the realizations based on
impulse responses, the input appears as v rather than v in the above system.
Despite the fact that the original system is stable, the above model contains n y
integrators, which reflect the integrating effect of v on output y. This model
form will be useful in designing MPC controllers that automatically possess
the integral action.
8.4
Summary
Figure 8.5 shows a diagram summarizing the procedures for converting one
form of model to another. Materials in the rest of the book assume a discretetime state-space model; the diagram illustrates the fact that one can obtain
such a model form starting from any model form. In addition, we can see from
the diagram that several possible routes exist for most conversions.
Example 8.11 As a simple example, consider the continuous transfer function
g 0 (s) =
e2s
.
s+1
1 1 es
0.6321
1
Z L
=
s+1
s
z 0.3679
we obtain the pulse transfer function
g(z) = z 2
0.6321
z 0.3679
35
36
Modeling
0.3679
A = 1.0
0
8.5
0
0
1.0
0
1
0 ; B = 0 ; C = [0
0
0
0.6321]
Disturbance Modeling
One of the main reasons for control is to suppress the effect of external disturbances on key process variables. It is sometimes possible to eliminate the
source of disturbances entirely through design modifications, but more often
their effects need to be offset by adjusting manipulated variables. In designing
a controller that performs this task in an efficient manner, it is helpful to have
a model that enables prediction of disturbances influence on the outputs of
interest on the basis of measurement signals. While such a model may be
constructed from first principles or system identification, it is not necessary
for the model to include exact physical sources for all disturbances. For linear
controller design, it is sufficient to include in the model their combined overall
effect on the output. This is important since in many cases it is not even
possible to determine the physical sources of disturbances. In this section, we
will discuss stochastic models of disturbances. We will also show how such
disturbance models can be integrated with deterministic system models for
estimation and control.
8.5.1
37
(8.90)
T
E{xw (k)xTw (k)} = Aw E{xw (k 1)xTw (k 1)}ATw + Bw R Bw
(8.91)
(8.92)
(8.93)
Thus,
w
= limk E{w(k)} = 0
Rw ( ) = limk E{w(k + )w T (k)} = Cw Aw Pw CwT + R
(8.94)
In the limit,(8.90) becomes a stationary process, the mean and the covariance
of which are independent of the initial condition, as shown by (8.94).
Choosing (Aw , Bw , Cw ) to match a given Rw ( ) according to (8.94) is not
straightforward but some numerical approaches are available. These will be
discussed in a later chapter that covers system identification.
Transfer Function
Stationary stochastic processes can also be described using transfer matrices.
The general form is
w(k) = H(q)(k)
(8.95)
38
Modeling
8.5.2
39
(8.97)
(8.98)
Since
the covariance of (k) grows without bound as k and hence (k) is a
non-stationary stochastic process.
More generally, a stationary signal superimposed on pure jumps or Brownian motion can be used to model persistent or drifting disturbances. Such
a superimposed signal can be described by a linear system driven by an integrated white noise (see Figure 8.7).
xw (k + 1) = Aw xw (k) + Bw int (k)
w(k) = Cw xw (k) + int (k)
(8.99)
(8.100)
Note that the external noise (k) is now a white noise sequence, which is
the standard way to write models in optimal estimation and control. Note
that w(k) becomes a zero-mean stationary process. Any covariance can be
matched by appropriately choosing the state-space matrices.
40
Modeling
(8.101)
In this case too, it is convenient to express the model in terms of the difference
variables:
w(k) = H(q)(k)
8.5.3
(8.102)
(8.103)
where 1 (k) and 2 (k) are zero-mean white noises. The above model is best
interpreted as the superposition of two models,
xd (k + 1) = Axd (k) + Bv(k)
yd (k) = Cxd (k)
(8.104)
(8.105)
and
(8.106)
So the first system models the effect of deterministic inputs and the second
system charaeterizes the residual vector in a statistical sense (i.e., in terms of
its covariance function).
Such a model can be obtained through either fundamental modeling or
system identification. However, the model should not be mistaken as one
that is obtained by simple addition of white noise inputs to a deterministic
system model. In general, this would not result in a good statistical description
of the residual and eventually lead to a poor prediction performance. In the
identification section, we will discuss some methods to obtain such a combined
model from input output data.
41
Possible Exercises
1. Prove the observability condition using the Cayley Hamilton theorem.
In what sense does the null space of Wno represent the unobservable
subspace?
Solution: To see how this condition arises, first notice that observability
is equivalent to the fact that no nonzero initial state with zero input
result in a zero output response. Note that
y(0)
C
y(1) CA
(8.107)
=
x(0)
..
..
.
.
CAk1
y(k 1)
T
Let us denote C T (CA)T (CAk1 )T
as Wko . For a system
o
to be unobservable, the null space of Wk must be nontrivial for all finite
ks. Then, observability implies the existence of a finite k such that
rank(Wko ) = n. Since rank(Wko ) rank(Wno ) for all k, the condition
reduces to the rank condition of (8.39).
2. Prove the Hautus condition for stabilizability.
Solution: The condition for stabilizability can be proved in a similar
way. First, we prove
Stabilizable
rank A I B = n +
A
rank
A I B
rank A I B 6= n
6= n
for some +
A
for some +
A
Following the same argument as before, we can show that, in this case,
there is an eigenvector of A (denoted by x) corresponding to an unstable
eigenvalue such that
xT B AB An1 B = 0T
rank
A I B
= n +
A
42
Modeling
Again, we can prove instead
Not
stabilizable
rank
A I B
6= n
for some +
A
The assumption of the system being not stabilizable implies that there
exists x outside the reachable subspace such that
xT B AB An1 B = 0T
This means
xT B = 0, xT AB = 0, , , xT An1 B = 0
This implies that x is an eigenvector of A corresponding to an unstable
eigenvalue and xT B = 0. This, in turn, implies that
xT A I B = 0T for some +
A
3. Derive the final value theorem for the z-transform and use it to prove that
system gain can be obtained by setting z = 1 in the transfer function.
4. Derive the equation y(z) = H(z)v(z) where H(z) is the z-transform of
the sequence of impulse response coefficient matrices {Hk }. Do this by
z-transforming the convolution equation.
5. Show that z-transform of the observer canonical form and the controller
canonical form of state-space realization indeed leads to the same transfer
function.
6. Consider the following MIMO system:
"
120s+2
G(s) =
(100s+1)(20s+1)
80s
(100s+1)(20s+1)
80s
(100s+1)(20s+1)
(120s+2)
(100s+1)(20s+1)
A1 1
B11
0
A
0
B
12
12
x(k) +
x(k + 1) =
B21
A21
0
A22
0 B22
C11 C12 0
0
y(k) =
x(k)
0
0 C21 C22
This
u(k)
43
References
1. Linear systems text books, Kwakernaak and Sivan, Astrom and Wittenmark....
2. Further information on reachability, controllability, observability, etc.,
esp. the extension to continuous-time systems and time-varying systems,
and other linear system concepts can be found in Kwakernaak and Sivan.
Some insightful but limited treatments of these concepts can also be
found in Astrom and Wittenmark.
3. z-transform and pulse transfer function table in Astrom and Wittenmark.
4. References in stochastic modeling, esp. spectral factorization theorem.
Jazwinski gives a terse treatment of the subject within the confines of
optimal estimation. Astrom and Wittenmark discusses stochastic modeling of disturbances in the general setting of optimal estimation and
control.
44
8.6
Modeling
Model Reduction
8.6.1
(8.108)
x
(k + 1) = A
y(k) = C x
(k)
(8.109)
45
r v(k)
x
r (k + 1) = Ar x
r (k) + B
y(k) = Cr x
r (k),
(8.110)
where
x
r (k) =
Ar =
Ir 0
Ir 0
x
(k)
Ir 0 B
Ir
= C
0
r =
B
Cr
Ir
0
(8.111)
8.6.2
Wo =
B AB A2 B
CT
(CA)T
(CA2 )T
(8.112)
(8.113)
Recall that the truncated versions of the above matrices (Wnc and Wno ) were
used for checking reachability and observability, which accounts for their names.
Hankel matrix HG is defined as
HG = W o W c
(8.114)
and
x(0) = W c
y(0)
y(1)
y(2)
..
.
u(1)
u(2)
u(3)
..
.
= W o x(0)
(8.115)
(8.116)
46
Modeling
W c maps the past inputs u(, 0) to the current state x(0) which is subsequently mapped to the current / future outputs y[0, ) via W o . Assuming
the original system is controllable and observable, W c and W o both have rank
of n and therefore HG has rank n. In this case, HG has n nonzero singular
values defined as
i (HG ) = i (W o W c )
1/2
= i ((W c )T (W o )T W o W c )
1/2
= i (W c (W c )T (W o )T W o )
(8.117)
Here i () denotes the ith eigenvalue. i (HG )s are called Hankel singular values
of G and their magnitudes reflect the relative importance of the various modes
of Hankel matrix HG in describing the input-output dynamics. The largest
singular value
(HG ) is called the Hankel norm of G.
Let us define
c T
= W (W ) =
Q = (W o )T W o =
j=0
Aj BB T AT j
(8.118)
AT j C T CAj
(8.119)
j=0
P and Q are called the controllability and observability gramians and satisfy
the following matrix equations respectively:
AP AT P + BB T
= 0
(8.120)
A QA Q + C C = 0
(8.121)
One can verify this by direct substitution of (8.118) and (8.119) into (8.120)
and (8.121). Equations of the above form are called Lyapunov equations and
their numerical properties have been studied extensively due to their common
occurrences in control applications. The ith Hankel singular value is simply [i (P Q)]1/2 where P and Q are positive-definite solutions to (8.120) and
(8.121) respectively. Hence, one must solve a pair of Lyapunov equations to
calculate the Hankel singular values. We remark that the Hankel matrix and
hence its singular values are properties intrinsic to the systems input-output
dynamics and are invariant under a state coordinate transformation.
One possible objective for model reduction is to transform the state coordinates such that, when the truncation of type (8.110) is made, the Hankel
matrix of the reduced-order system approximates that of the full-order system
as closely as possible. It has been shown that there exists an rth order model
Gr such that
47
(8.122)
8.6.3
(8.123)
Gr
1 (HG )
P = Q = HG =
..
.
n (HG )
(8.124)
This way, the last state is the least reachable and observable, the second to the
last is the second least reachable and observable, and so on. Before we derive
a procedure to obtain a balanced realization, let us note that, under a state
coordinate transformation of x
= T x, the controllability and observability
gramians are transformed as follows:
P = T P T T
= (T 1 )T QT 1
Q
(8.125)
(8.126)
(8.127)
The factorization of the above form is called Cholesky factorization. The state
coordinate transformation of x
1 = Rx gives
48
Modeling
(8.128)
(8.129)
P1 = RP RT
Q1 = (R1 )T QR1 = I
(8.130)
1/2
1/2
(8.131)
Q2 =
(8.132)
1/2
1/2
HG U1T U1 HG
= HG
1/2
(HG HGr ) 2
n
X
j (HG )
(8.133)
j=r+1
8.6.4
(8.134)
Recall the following state-space model, which was introduced as the observable
canonical realization of a FIR model.
49
(8.135)
where
..
A =
n
.
I 0 0
C =
0 I
0 0
.
..
0 0
0 0
0
I
... ... 0
0 ... 0
(8.136)
(8.137)
(8.138)
(8.121)
Because of the special structure of A, B, C matrices for the FIR model, we can
obtain explicit solutions to the above equations. The observability gramian
Q is simply an identity matrix. The controllability gramian P can also be
obtained analytically by using the following recursive formula:
T
Pj,n = Pn,j
= (BB T )j,n
Pi,j
1jn
(8.139)
(8.141)
B,
C)
can then be obtained by the state-coordinate
A balanced realization (A,
1/2
transformation of x
= HG U T x.
50
Modeling
20s + 1
(100s + 1)(20s + 1)
Determine the FIR coefficients of the system with the sample time of 5.
How many coefficients do you need to adequately describe the system?
Realize the FIR system and perform a balanced truncation as shown in
the lecture. How many significant Hankel singular values do you see?
That determines the minimum order of the system you need. Write
down the reduced order system in a balanced form.
SOLUTIONS are given in YR2000 MPC Class HW Solution (Copy
from the solution).
References
An optimal Hankel model reduction method can be found in Glover.
The balanced realization and truncation is due to Moore??????
The balanced truncation of the FIR model is drawn from ????, which
discusses a finite dimensional approximation of time delays.
51
8.6.5
A general form of ODEs derived from first principles modeling after linearization is
c u + Bc w
x p = Acp xp + B1,p
2,p
(8.142)
y = Cp x + D2,p w
where xp is the state, u the manipulated input, w the disturbance input and
y the output. Subscript p is used throughout to distinguish the state and the
model matrices for the process from those for disturbances that we introduce
later. The above model can be discretized according to formula (8.13) (the
formula for B can be applied to both B1,p and B2,p ):
xp (k + 1) = Ap x(k) + B1,p u(k) + B2,p w(k)
y(k) = Cp xp (k) + D2,p w(k)
(8.143)
52
Modeling
(8.144)
where (k) is white noise with covariance R and Aw is a matrix with all its
eigenvalues strictly inside the unit disk.
One can augment (8.143) with (8.144) to arrive at the following model:
xp (k)
Ap B2,p Cw
B1,p
B2,p
xp (k + 1)
=
+
u(k) +
(k)
0
Aw
0
xw (k)
Bw
xw (k + 1)
|
| {z }
{z
}
{z
} | {z } | {z }
|
A
B1
B2
x(k+1)
x(k)
xp (k)
Cp D2,p Cw
+ D2,p (k)
y(k) =
|{z}
{z
} xw (k)
|
| {z }
D
C
x(k)
(8.145)
(8.146)
2 (k)
Note that the state is now expanded to include both the original process model
state xp and disturbance state xw .
Augmented System Model for Cases with Persistent Disturbances
For nonstationary disturbances with persistent characterisics, we earlier introduced the stochastic model
xw (k + 1) = Aw xw (k) + Bw (k)
w(k) = Cw xw (k) + (k)
(8.147)
The system model (8.143) can also be written in the differenced form,
xp (k + 1) = Ap xp (k) + B1,p u(k) + B2,p w(k)
y(k) = Cp xp (k) + D2,p w(k)
(8.148)
53
B2,p
Ap B2,p Cw
xp (k)
B1,p
xp (k + 1)
u(k) +
(k)
=
+
Bw
0
Aw
0
xw (k)
xw (k + 1)
| {z }
|
|
{z
}
{z
}|
{z
} | {z }
A
B1
B2
x(k+1)
x(k)
xp (k)
Cp D2,p Cw
y(k) =
+ D2,p (k)
|{z}
|
{z
} xw (k)
|
{z
}
D2
C
x(k)
(8.149)
For estimation and control, it is further desired that the model output be y
rather than y. This requires yet another augmentation of the state with
output y according to
x(k + 1)
A 0
x(k)
B1
=
+
u(k)
y(k + 1)
y(k)
CB
CA I
1
0
B2
+
(k + 1) +
(k)
(8.150)
D2
DB2
x(k)
0 I
y(k) =
y(k)
x(k) + B
1 u(k) + 1 (k)
x
(k + 1) = A
y(k) = C x
(k) + 2 (k)
(8.151)
except that now the system input is u rather than u. Note that this system
has ny integrators, which reflect the integrated effect of the white noise input
1 (k) and and the system input u on the output. This particular form will
be used for the development of advanced MPC techniques.
8.6.6
G(q) H(q) ,
x(k + 1) = A(k) + B1 u(k) + B2 (k)
y(k) = Cx(k) + D1 u(k) + D2 (k)
(8.153)
54
Modeling
From the fact that G(q) has relative degree of at least one and it is conventional
to assume without loss of generality that H(0) = I, D1 = 0 and D2 = I.
Hence, the above is in the same form as (8.146)
As before, in the case that the disturbance effects are non-stationary exhibiting mean shifts, the driving noise should be integrated white noise.
y(k) = G(q)u(k) + H(q)
1
(q)
1 q 1
{z
}
|
(8.154)
int (q)
(8.155)
(8.156)
8.6.7
Examples
Take a chemical process and show how to put everything together both via
fundamental modelling and identification here.
MODEL PREDICTIVE
CONTROL
Manfred Morari
Jay H. Lee
Carlos E. Garca
Chapter 1
RANDOM VARIABLES
INTRODUCTION
What Is Statistics?
Statistics deals with the application of probability theory to real problems.
There are two basic problems in statistics.
Given a probabilistic model, predict the outcome of future trial(s). For
instance one may say:
choose the prediction x
such that expected value of (x x
)2 is
minimized.
Given collected data, define / improve a probabilistic model.
For instance, there may be some unknown parameters (say ) in the
probabilistic model. Then, given data X generated from the particular
probabilistic model, one should construct an estimate of in the form
of (X).
For example, (X)
may be constructed based on the objective
2.
of minimizing expected value of k k
2
Another related topic is hypothesis testing, which has to do with testing
whether a given hypothesis is correct (i.e, how correct defined in terms
of probability), based on available data.
In fact, one does both. That is, as data come in, one may continue to
improve the probabilistic model and use the updated model for further prediction.
A priori Knowledge
Error
feedback
Predictor
PROBABILISTIC
MODEL
ACTUAL
SYSTEM
1.1.2
X
+
F( ;d)
P(;d)
Note that
P(; d)d =
dF (; d) = 1
(1.3)
In addition,
Z
b
a
P(; d) d =
b
a
)
1
1 m 2
P(; d) =
exp
2
2 2
(1.5)
P( ;d)
m-
68.3%
m +
T
Let d = d1 dn
be a continuous random variable vector(d Rn ).
Now we must quantify the distribution of its individual elements as well as
their correlations.
Joint Probability Distribution Function
The joint probability distribution function F (1 , , n ; d1 , , dn ) for
random variable vector d is defined as
F (1 , , n ; d1 , , dn ) = P r{d1 1 , , dn n }
(1.6)
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
3
2
2
0
1
0
2
3
,,
P(1 , , n ; d1 , , dn )d1 dn = 1
(1.9)
(1.11)
(1.12)
If
d1 , , dn are called mutually independent.
Example: Guassian or Jointly Normally Distributed Variables
Suppose that d = [d1 d2 ]T is a Gaussian variable. The density takes the form
of
"
(
1 m 1 2
1
1
P(1 , 2 ; d1 , d2 ) =
exp
2(1 2 )
1
2 1 2 (1 2 )1/2
2 #)
(1 m1 )(2 m2 )
2 m 2
2
+
(1.13)
1 2
2
(
)
1
1 1 m 1 2
= p
exp
2
1
212
P(2 ; d2 ) =
=
P(1 , 2 ; d1 , d2 ) d1
(
)
1
1 2 m 2 2
p
exp
2
2
222
(1.14)
(1.15)
(1.16)
(1.17)
Hence, (m1 , 1 ) and (m2 , 2 ) represent parameters for the marginal density of
d1 and d2 respectively. Note also that
P(1 , 2 ; d1 , d2 ) 6= P(1 ; d1 )P(2 ; d2 )
(1.18)
except when = 0.
General n-dimensional Gaussian random variable vector d = [d1 , , dn ]T
has the density function of the following form:
P(; d) = P(1 , , n ; d1 , , dn )
1
1
T 1
=
exp ( d) Pd ( d)
n
2
(2) 2 |Pd |1/2
(1.19)
(1.20)
E{f (d)} =
f ()P(; d) d
(1.21)
d = E{d} =
P(; d) d
(1.22)
2} =
Var{d} = E{(d d)
2 P(; d) d
( d)
(1.23)
1
P(; d) =
exp
2
2 2
2 )
(1.24)
2 )
Z
1
m
1
exp
d = m
(1.25)
d = E{d} =
2
2 2
2 )
Z
1
m
1
2
2
}=
exp
d = 2
Var{d} = E{(d d)
( m)
2
2
(1.26)
2
Hence, m and that parametrize the normal density represent the mean and
the variance of the Gaussian variable.
EXPECTATION OF RANDOM VARIABLES AND RANDOM VARIABLE FUNCTIONS: VECTOR CASE
We can extend the concepts of mean and variance similarly to the vector case.
Let d be a random variable vector that belongs to Rn .
Z
` P(` ; d` ) d`
(1.27)
d` = E{d` } =
Z
Z
=
` P(1 , , n ; d1 , , dn ) d1 , , dn
Z
2
Var{d` } = E{(d` d` ) } =
(` d` )2 P(` ; d` ) d`
(1.28)
Z
Z
=
(` d` )2 P(1 , , n ; d1 , , dn ) 1 , , dn
In the vector case, we also need to quantify the correlations among different
elements.
Cov{d` , dm } = E{(d` d` )(dm dm )}
(1.29)
Z
Z
(` d` )(m dm )P(1 , , n ; d1 , , dn ) d1 , , dn
Note that
Cov{d` , d` } = Var{d` }
The ratio
= p
Cov{d` , dm }
Var{d` }Var{dm }
(1.30)
(1.31)
d)
T}
Cov{d} = E{(d d)(d
(1.32)
Z
Z
d)
T P(1 , , n ; d1 , , dn ) d1 , , dn
( d)(
The (i, j)th element of Cov{d} is Cov{di , dj }. The diagonal elements of Cov{d}
are variances of elements of d. The above matrix is symmetric since
Cov{di , dj } = Cov{dj , di }
(1.33)
(1.35)
1
1
1 m 1 2
exp
(1.36)
P(; d) =
2(1 2 )
1
2 1 2 (1 2 )1/2
#)
(1 m1 )(2 m2 )
2 m 2 2
2
+
1 2
2
Then,
E{d} =
=
m2
m2
1
2
P(; d) d1 d2
(1.37)
Z Z
1 m 1
(1 m1 ) (2 m2 ) P(; d) d1 d2
Cov{d} =
2 m 2
2
1
1 2
=
(1.38)
1 2
22
Example: Gaussian Variables n-Dimensional Case
Let d = [d1 dn ]T and
1
1
T 1
exp ( d) Pd ( d)
P(; d) =
n
2
(2) 2 |Pd |1/2
(1.39)
E{d} =
Z
Z
d)
T P(; d) d1 , , dn = Pd(1.41)
Cov{d} =
( d)(
Hence, d and Pd that parametrize the normal density function P(; d) represent
the mean and the covariance matrix.
Exercise: Verify that, with
2
m
1
1
2
1
d =
; Pd =
m2
1 2
22
(1.42)
one obtains the expression for normal density of a 2-dimensional vector shown
earlier.
NOTE: Use of SVD for Visualization of Normal Density
Covariance matrix Pd contains information about the spread (i.e., extent of
deviation from the mean) for each element and their correlations. For instance,
Var{d` } = [Cov{d}]`,`
{d` , dm } =
(1.43)
[Cov{d}]`,m
q
[Cov{d}]`,` [Cov{d}]m,m
(1.44)
where []i,j represents the (i, j)th element of the matrix. However, one still has
hard time understanding the correlations among all the elements and visualizing the overall shape of the density function. Here, the SVD can be useful.
Because Pd is a symmetric matrix, it has the following SVD:
d)
T}
Pd = E{(d d)(d
= V V
v1
(1.45)
vn
(1.46)
1
..
v1T
..
.
vnT
n
(1.47)
10
1
..
d)
TV } =
E{V T (d d)(d
.
n
(1.48)
E{(d d )(d d )T } =
1
..
.
n
(1.49)
20.2 19.8
19.8 20.2
"
2
2
2
2
2
2
22
10 0
0 0.1
"
2
2
2
2
2
2
2
2
(1.50)
11
=
=
R +
lim0 P(, ; x, y)d
Z Z +
P(, ; x, y)d d
|
{z
}
normalization factor
P(, ; x, y)
R
P(, ; x, y)d
P(, ; x, y)
P(, y)
(1.51)
(1.52)
(1.53)
Note:
The above means
(1.54)
P(|; x|y) d = 1
(1.55)
(1.56)
(1.57)
if and only if
This means that the conditional density is same as the marginal density
when and only when x and y are independent.
We are interested in the conditional density, because often some of the
random variables are measured while others are not. For a particular trial,
if x is not measurable, but y is, we are intersted in knowing P(|; x|y) for
estimation of x.
Finally, note the distinctions among different density functions:
12
b2
a2
b1
a1
P(, ; x, y)
P(, y)
(1.61)
P(, ; x, y)
P(, x)
(1.62)
Bayes Rule:
Note that
P(|; x|y) =
P(|; y|x) =
P(, ; x, y)
P(, y)
P(, ; x, y)
P(, x)
(1.63)
(1.64)
Hence, we arrive at
P(|; x|y) =
P(|; y|x)P(, x)
P(, y)
(1.65)
(1.66)
(1.67)
13
Bayes Rule is useful, since in many cases, we are trying to compute P(|; x|y)
and its difficult to obtain the expression for it directly, while it may be easy
to write down the expression for P(|; y|x).
We can define the concepts of conditional expectation and conditional covariance using the conditional density. For instance, the conditional expectation of x given y = is defined as
Z
E{x|y} =
P(|; x|y)d
(1.68)
(1.69)
(1.70)
#)
1
x
2
( x
)( y)
y 2
exp
2
+
2(1 2 )
x
x y
y
P(, ; x, y) =
2 x
1 y 2
q
(1.72)
exp
2
y
2y2
|
{z
}
marginal density of y
!2
x
)
1
1
p
p y
exp
2
2x2 (1 2 )
x 1 2
|
{z
}
conditional density of x
(
)
1
1 x
2
p
exp
(1.73)
2
x
2x2
{z
}
|
marginal density of x
!2
y
1
1
p x
q
exp
2
y 1 2
2y2 (1 2 )
{z
}
|
conditional density of y
1
14
Hence,
P(|; x|y) =
P(|; y|x) =
!2
)
1
1
p
p y
(1.74)
exp
2
2x2 (1 2 )
x 1 2
!2
y
1
1
p x
q
exp
(1.75)
2
y 1 2
2 2 (1 2 )
y
Note that the above conditional densities are normal. For instance, P(|; x|y)
is a normal density with mean of x
+ xy ( y) and variance of x2 (1 2 ).
So,
x
E{x|y} = x
+ ( y)
(1.76)
y
x y
( y)
(1.77)
= x
+
y2
= E{x} + Cov{x, y}Var1 {y}( y)
(1.78)
(1.79)
(1.80)
1
(x y )
y2
(1.81)
(1.83)
15
..
E{x|y} =
. P(|; x|y) d1 , , dn
n
Z
(1.85)
T
1 E{x1 |y}
1 E{x1 |y}
..
..
Cov{x|y} =
P(|; x|y) d1 , , dn
.
.
n E{xn |y}
n E{xn |y}
(1.86)
Z
x
y
(1.87)
1
(2)
n+m
2
|Pz |1/2
1
exp ( z)T Pz1 ( z)
2
(1.88)
where
z =
Pz =
; =
(1.89)
Cov(x) Cov(x, y)
Cov(y, x) Cov(y)
(1.90)
(1.91)
(x)( x
)
(1.92)
o
n
(1.93)
and
(1.94)
(1.95)
(1.96)
16
1.1.3
STATISTICS
PREDICTION
The first problem of statistics is prediction of the outcome of a future trial
given a probabilistic model.
Suppose P(x), the probability density for random variable x, is
given. Predict the outcome of x for a new trial (which is about to
occur).
Note that, unless P(x) is a point distribution, x cannot be predicted exactly.
2
x
= arg min E kx x
k2
x
If a related variable y (from the same trial) is given, then one should use
x
= E{x|y} instead.
SAMPLE MEAN AND COVARIANCE, PROBABILISTIC MODEL
The other problem of statistics is inferring a probabilistic model from collected
data. The simplest of such problems is the following:
We are given the data for random variable x from N trials. These
data are labeled as x(1), , x(N ). Find the probability density
function for x.
Often times, a certain density shape (like normal distribution) is assumed to
make it a well-posed problem. If a normal density is assumed, the following
sample averages can then be used as estimates for the mean and covariance:
=
x
N
1 X
x(i)
N
i=1
N
1 X
Rx =
x(i)xT (i)
N
i=1
17
Note that the above estimates are consistent estimates of real mean and covariance x
and Rx (i.e., they converge to true values as N ).
A slightly more general problem is:
1.2
STOCHASTIC PROCESSES
A stochastic process refers to a family of random variables indexed by a parameter set. This parameter set can be continuous or discrete. Since we are
interested in discrete systems, we will limit our discussion to processes with
a discrete parameter set. Hence, a stochastic process in our context is a time
sequence of random variables.
1.2.1
DISTRIBUTION FUNCTION
Let x(k) be a sequence. Then, (x(k1 ), , x(k` )) form an `-dimensional random variable. Then, one can define the finite dimensional distribution function and the density function as before. For instance, the distribution function
F (1 , , ` ; x(k1 ), , x(k` )), is defined as:
F (1 , , ` ; x(k1 ), , x(k` )) = Pr{x(k1 ) 1 , , x(k` ) ` }
(1.97)
18
x
(k) = E{x(k)} =
Its covariance is defined as
dF (; x(k))
(1.98)
Rx (k1 , k2 ) = E{[x(k
(k1 )][x(k2 ) x
(k2 )]T }
R R 1 ) x
= [1 x
(k1 )][2 x
(k2 )]T dF (1 , 2 ; x(k1 ), x(k2 ))
(1.99)
The cross-covariance of two stochastic processes x(k) and y(k) are defined as
Rxy (k1 , k2 ) = E{[x(k
(k1 )][y(k2 ) y(k2 )]T }
R R 1 ) x
(k1 )][2 y(k2 )]T dF (1 , 2 ; x(k1 ), y(k2 ))
= [1 x
(1.100)
Gaussian processes refer to the processes of which any finite-dimensional distribution function is normal. Gaussian processes are completely characterized
by the mean and covariance.
STATIONARY STOCHASTIC PROCESSES
Throughout this book we will define stationary stochastic processes as those
with time-invariant distribution function. Weakly stationary (or stationary
in a wide sense) processes are processes whose first two moments are timeinvariant. Hence, for a weakly stationary process x(k),
E{x(k)} = x
k
T
E{[x(k) x
][x(k ) x
] } = Rx ( ) k
(1.101)
In other words, if x(k) is stationary, it has a constant mean value and its
covariance depends only on the time difference . For Gaussian processes,
weakly stationary processes are also stationary.
For scalar x(k), R(0) can be interpreted as the variance of the signal and
)
reveals its time correlation. The normalized covariance R(
R(0) ranges from
0 to 1 and indicates the time correlation of the signal. The value of 1 indicates
a complete correlation and the value of 0 indicates no correlation.
R( )
R(0)
Note that many signals have both deterministic and stochastic components. In some applications, it is very useful to treat these signals in the same
framework. One can do this by defining
P
x
= limN N1 N
k=1 x(k)
(1.102)
1 PN
Rx ( ) = limN N k=1 [x(k) x
][x(k ) x
] T
Note that in the above, both deterministic and stochastic parts are averaged out. The signals for which the above limits converge are called quasistationary signals. The above definitions are consistent with the previous
19
definitions since,in the purely stochastic case, a particular realization of a stationary stochastic process with given mean (
x) and covariance (Rx ( )) should
satisfy the above relationships.
x () =
1 X
Rx ( )ej
2 =
(1.103)
Area under the curve represents the power of the signal for the particular frequency range. For example, the power of x(k) in the frequency range (1 , 2 )
is calculated by the integral
2
=2
x ()d
=1
Rx ( ) =
x ()ej d
(1.104)
(1.105)
E{x(k)x(k) } = Rx (0) =
x ()d
which indicates that the total area under the spectral density is equal to the
variance of the signal. This is known as the Parsevals relationship.
Example: Show plots of various covariances, spectra and realizations!
**Exercise: Plot the spectra of (1) white noise, (2) sinusoids, and (3)white
noise filtered through a low-pass filter.
20
Rx if = 0
E{(x(k) x
)(x(k ) x
) T } =
0
if 6= 0
(1.107)
d(k) = H(q)(k) + d
(1.109)
(1.110)
21
arbitrary d and has no pole or zero outside the unit disk. In other words,
the first and second order moments of any stationary signal can be matched
by the above model.
This result is very useful in modeling disturbances whose covariance functions are known or fixed. Note that a stationary Gaussian process is completely
specified by its mean and covariance. Such a process can be modelled by filtering a zero-mean Gaussian white sequence through appropriate dynamics
determined by its spectrum (plus adding a bias at the output if the mean is
not zero).
P()
y(k)
90%
10%
22
~ 1
H (q )
1 (k)
1 q
(k)
y(k)
H (q1 )
1
1 q
y(k)
(k)
1
1 q1
H(q1)
y(k)
(k)
H(q1)
y(k)
(1.111)
(1.112)
If all the eigenvalues of A are strictly inside the unit disk, the above approaches a stationary process as k since
limk E{x(k)} = 0
limk E{x(k)xT (k)} = Rx
(1.113)
(1.114)
23
CRx C T + DR DT for = 0
T
Ry ( ) = E{y(k + )y (k)} =
CA Rx C T + CA 1 BR DT for > 0
(1.116)
The spectrum of w is obtained by taking the Fourier transform of Ry ( )) and
can be shown to be
T
(1.117)
y () = C(ej I A)1 B + D R C(ej I A)1 B + D
In the case that A contains eigenvalues on or outside the unit circle, the
process is nonstationary as its covariance keeps increasing (see Eqn. (1.112).
However, it is common to include integrators in A to model mean-shifting
(random-walk-like) behavior. If all the outputs exhibit this behavior, one can
use
x(k + 1) = Ax(k) + B(k)
(1.118)
y(k) = Cx(k) + D(k)
Note that, with a stable A, while y(k) is a stationary process, y(k) includes
an integrator and therefore is nonstationary.
Stationary process
()
x(k+1)=Ax(k)+B (k)
y(k)=Cx(k)+D (k)
stable system
Nonstationary(Meanshifting) process
y(k)
()
x(k+1)=Ax(k)+B (k)
y(k)=Cx(k)+D (k)
stable system
y(k)
1
1-q -1
integrator
y(k)
MODEL PREDICTIVE
CONTROL
Manfred Morari
Jay H. Lee
Carlos E. Garca
Chapter 6
State Estimation
/labelchap:stateestimation
In practice, it is unrealistic to assume that all the disturbances and state
variables can be measured. In general, one must estimate the state from the
measured input / output sequences. This is called state estimation.
Let us bring in the standard state-space system description we introduced
in the previous chapter:
x(k + 1) = Ax(k) + Bu(k) + 1 (k)
y(k) = Cx(k) + 2 (k)
(6.1)
1 (k)
1 (k)
1 (k)
1 (k)
T )
R1 R1,2
T
R2
R1,2
(6.2)
Main role of state estimation in the context of MPC is to realign the model
state to the process based on the measured signals so that accurate multi-step
predictions of the outputs, measured and possibly unmeasured, can be made.
With the focus of most state estimation literature given on the estimation
techniques, often overlooked is the importance of disturbance modeling. Simply adding white noises to the state and output equations of a deterministic
system model, as is sometimes done by those who misunderstand the meaning of white noise in the standard state-space system description, can lead to
extremely poor results regardless of the technique. In general, to obtain satisfactory results, disturbances (or their overall effect on the outputs) should
be modeled as appropriate stationary / non-stationary stochastic processes
and the system equations must be augmented with their describing stochastic
equations, as we have shown in the previous chapter.
6.1
We start the discussion with a simple linear state estimator structure for system (6.1):
x
(k|k 1) = A
x(k 1|k 1) + Bu(k 1)
x
(k|k) = x
(k|k 1) + K(y(k)
x
(k|k 1))
(6.3)
In the above, x
(i|j) denotes an estimate of x(i) based on the measurements
available at time j. The two equations allow us to recursively compute the
filtered estimate x
(k|k). The first (which we refer to as the model forwarding
equation) corresponds to the simple propagation of the model state to next
time step without accounting for errors in the estimate and the effect of new
noise. The second (which we refer to as the measurement update equation)
attempts to compensate for these neglected factors by correcting the estimate
on the basis of the term called innovation, which is the difference between the
actual measurement of the output and its predicted value from the current
state estimate.
In some applications, one may need to compute the one-step-ahead prediction x
(k + 1|k) rather than the filtered estimate. For instance, in situations
where control computation requires one full sample period, one needs x
(k+1|k)
at time k in order to begin the computation of the control input u(k + 1). This
is not a problem as (6.3) can be implemented also as a one-step-ahead predictor simply by executing the measurement correction step first and the model
forwarding step afterwards:
x
(k|k) = x
(k|k 1) + K(y(k)
x
(k|k 1))
x
(k + 1|k) = A
x(k|k) + Bu(k)
(6.4)
page
{y(k) C x
x
(k + 1|k) = A
x(k|k 1) + Bu(k) + |{z}
AK
(k|k 1)} (6.5)
K
x
e (k) = x(k) x
(k|k)) is minimized in some sense.
Equations for error dynamics can be easily derived. For instance, the
equations for the one-step-ahead prediction error is
xe (k + 1) = (A KC)xe (k) + 1 (k) + K2 (k)
(6.6)
x
e (k + 1) = (A KCA)
xe (k) + (I KC)
1 (k) + K2 (k + 1)
6.2
(6.7)
(6.8)
and it is clear that, for observer stability, all the eigenvalues of (A KC) must
lie strictly inside the unit disk. In addition, the eigenvalues of (A KCA)
is
be shown that, if (C, A) is an observable pair, the observer poles can be placed
at arbitrary locations through K. The proof is left as a homework exercise.
The conventional pole placement technique requires the system model to be
transformed into certain canonical forms (such as the observer form) so that
the observer eigenvalues can be related to the parameters in K in a transparent
manner.
Unfortunately, pole placement has seen very little use in process control. A
major drawback of pole placement is that, in general, it is not clear where the
observer poles should be placed for good estimation performance for a given
situation. An immediate factor for determining the observer pole locations is
the trade-off between the speed of convergence and noise filtering. This is a
problem that can be solved relatively easily and satisfactorily through some
tuning of the pole locations.
A more subtle and difficult problem is that, pole location is not the sole
factor that determines the speed of recovery from errors. For instance, even
dead-beat observers - that have all the poles at the origin - can be very slow
in recovering from certain types of errors. This is particularly true when the
state dimension is high in relation to the output dimension. As a simple
example, consider the case that a deadbeat observer resulted in the following
error transition equation:
0 1 0 0
0 0 1 0
..
..
..
e1 e2 en 1
.
.
.
xe (k + 1) = e1 e2 en
..
..
.
.
1
0 0 0
{z
}
|
(AKC)
(6.9)
Even though all the observer poles are placed at the origin, it still takes n
sample steps to reject an error in the direction of the last eigenvector e n . On
the other hand, this could be the mode that is most vulnerable to disturbances,
in which case the deadbeat observer would perform very poorly.
For efficient estimation, it is helpful to incorporate into the estimator design
some statistical information on how disturbances affect the state. For instance,
even though the state may be of very high dimension, the number of modes
that are affected by disturbances may be much lower and priorities can be
given to attribute errors to those modes. These considerations motivate the
development of a statistically optimal estimator.
Example 6.1 NIKET: Demonstrate the above point! Compare the performance of a deadbeat observer designed through pole placement with a Kalman
filter. Make n fairly large. Note that one can always choose the primary
disturbance direction such that the deadbeat observer would perform poorly.
page
6.3
Kalman Filter
The filter gain matrix can also be determined to be optimal in some statistical
sense. For example, it can be chosen to minimize the variance of the estimation
error. The resulting estimator is the celebrated Kalman filter, which has been
the most popular state estimation technique by far. We present a derivation
for the simple case (when R12 = 0) and discuss some properties.
For the simplicity of discussion, let us assume for now that 1 and 2
are mutually independent with R1 0 and R2 > 0. Recall that the linear
estimator of (6.4) can be written in the following one-step-ahead predictor
form:
x
(k + 1|k) = A
x(k|k 1) + Bu(k) + K(k){y(k) C x
(k|k 1)}
(6.10)
In the above, we allowed the filter gain matrix to vary with time for more
generality.
6.3.1
(6.11)
Let
(6.12)
Let us assume that the initial guess is chosen so that E{xe (0)} = 0. Then, by
applying the rules for the expectation operator given in Appendix D
to (6.11), we obtain E{xe (k)} = 0 for all k 0 and
P (k + 1) = E xe (k + 1)xTe (k + 1)
= (A K(k)C)P (k)(A K(k)C)T + R1 + K(k)R2 K T (k)
(6.13)
In the above, we used the fact that xe (k), 1 (k) and 2 (k) in (6.11) are mutually
independent.
Now let us choose K(k) such that T P (k + 1) is minimized for any arbitrary choice of . It is straightforward algebra to show that
T o
(6.15)
1
P (k + 1) = AP (k)AT + R1 AP (k)C T R2 + CP (k)C T
CP (k)AT (6.16)
Given x
(1|0) and P (1), equations (6.15) and (6.16) can be used along with
(6.5) to recursively compute x
(k + 1|k). They are referred to as the timevarying Kalman filter equations. Matrix recursion formula (6.16) is called
Riccati Difference Equation (RDE).
Suppose P (k) in the RDE converges to a steady-state solution P as k
. Then, we may consider the possibility of sacrificing the optimality during
transient periods and implementing the filter with the constant filter gain
matrix given by
(6.17)
K = AP C T (R2 + CP C T )1
P can be obtained by either iterating on the RDE or by finding a positive
semi-definite solution of the following Algebraic Riccati Equation (ARE):
1
P = AP AT + R1 AP C T R2 + CP C T
CP AT
(6.18)
This we refer to as the steady-state Kalman filter as opposed to the timevarying Kalman filter. Properties of the ARE and the steady-state Kalman
filter will be discussed later.
Remarks:
Recall the relationship between the one-step-ahead predictor gain K(k)
K(k)
= P (k)C T (R2 + CP (k)C T )1
(6.19)
This gain matrix should be used to implement the optimal filter in the
form of (6.3) that recursively computes x
(k|k) rather than x
(k + 1|k).
page
The assumption of independence between 1 (k) and 2 (k) may not always be met. The derivation we have given above can be modified in a
straightforward manner for this case (see Exercise??? at the end of
this chapter).
In the above derivation, we imposed the linear estimator structure and
optimized the filter gain matrix based on some cost index. We can
also show that the Kalman filter is the optimal estimator in the sense
of minimizing the conditional expectation of the error, provided that 1
and 2 are Gaussian noises in addition to being white noises.
Example 6.2 NIKET: Apply the Kalman filter to the same problem for
which pole placement didnt work very well.
Show the importance of disturbance modeling. Show the performance by
just adding white noise to a deterministic model when integrated white noise
disturbances enter.
6.3.2
If (A, R1 ) has no uncontrollable mode on the unit circle and P (0) > 0,
then P (k) P as k , which is a stabilizing solution of the ARE.
1/2
The stabilizability of (A, R1 ) in the first statement implies that all the
unstable modes of A are independently excited by the external noise. This
means that the covariance for these modes will grow without bounds if the
filter is not stabilizing. By the virtue of optimality, the optimal filter has to
contain the growth if capable, and this means the optimal filter is necessarily
stable.
page
The second statement says that we can have exponentially unstable modes
that are not excited by the external noise but the optimal filter still achieves
observer stability. The stronger requirement of P (0) > 0 ensures that errors
in these modes will grow exponentially fast if left alone. For optimality, the
growth and reduction in the covariance (for the unstable modes) has to reach
a steady state and the balance point is, in fact, a nonzero covariance. This
means that the optimal filter has to be stable.
The reason why uncontrolllable modes on the unit circle cause problems
as can be seen in the third statement is that, unlike errors in the exponentially
unstable modes, errors in these modes are eventually reduced to zero by the
time-varying Kalman filter. Hence, at steady state, the covariance for these
modes drops to zero and the optimal filter no longer provides any correction
to these modes, thereby leaving these modes unstabilized.
This question of whether the steady-state solution is a unique positive
semi-definite solution to the ARE is relevant because one may wish to find
the steady-state solution directly by solving the ARE rather than iterating on
the RDE. If more than one positive semi-definite solution exists, one may very
well end up with a non-stabilizing solution even though a stabilizing solution
may exist. The answer is:
1/2
6.3.3
Extensions
Time-Varying System
Consider the time-varying system
x(k + 1) = A(k)x(k) + B(k)u(k) + 1 (k)
y(k) = C(k)x(k) + 2 (k)
(6.20)
10
where 1 (k) and 2 (k) are zero-mean white noises of covariances R1 (k) and
R2 (k). The system, for instance, may serve as an approximation to some
nonlinear system evolving along a trajectory. Generalizing the Kalman filter
equations to the above system can be done in a straightforward manner. Recall
that we already derived the equations for optimal time-varying estimator for
the linear time-invariant system. The fact that the system dynamics vary with
k does not cause any further complication and one gets the same form of the
estimator equation and update formula:
x
(k + 1|k) = A(k)
x(k|k 1) + B(k)u(k) + K(k){y(k) C(k)
x(k|k 1)} (6.21)
where
K(k) = A(k)P (k)C T (k)(R2 (k) + C(k)P (k)C T (k))1
(6.22)
and
P (k + 1) = A(k)P (k)AT (k)
1
+R1 (k) A(k)P (k)C T (k) R2 (k) + C(k)P (k)C T (k)
C(k)P (k)AT (k)
(6.23)
In this case, however, the covariance matrix and the gain matrix do not converge to steady-state values in general.
Periodically Time-Varying System and Multi-Rate Kalman Filter
Let us consider the system
xk (t + 1) = A(t)xk (t) + B(t)uk (t) + 1,k (t)
yk (t) = C(t)xk (t) + 2,k (t),
t = 0, , N 1
xk+1 (0) = xk (N )
(6.24)
The above represents a periodic linear system of period N . k is the run index
and t is the time index within a period.
Such periodic systems are common in practice. Batch systems that evolve
along some pre-specified trajectories can be modeled as such. Cyclic operation
of a continuous process also results in periodically varying dynamics. Finally, a
multi-rate sample data system (with its sample rates that are integer multiples
of some basic sampling unit) is a periodic system, the period of which is given
by the least common multiple of all the sampling periods. In this case, only
the C matrix is periodically varying.
The Kalman filter of the above can be derived straightforwardly from the
time-varying Kalman filter equations and is represented by the equations below:
x
k (t + 1|t)
&
x
k+1 (0| 1)
=
=
t = 0, , N 1
A(t)
xk (t|t 1) + B(t)uk (t) +
x
k (N |N 1)
(6.25)
11
page
where
Kk (t) = A(t)Pk (t)C T (t)(R2 (t) + C(t)Pk (t)C T (t))1
(6.26)
and
Pk (t + 1) = A(t)Pk (t)AT (t) + R1 (t)
1
A(t)Pk (t)C T (t) R2 (t) + C(t)Pk (t)C T (t)
C(t)Pk (t)AT (t),
Pk+1 (0) = Pk (N )
(6.27)
Due to the periodic nature, for detectable systems, as k ,
Kk (t) K (t)
Pk (t) P (t)
Hence, at steady state, the time-varying solution converges to a periodic solution. As in the case of linear time-invariant systems, if one is willing to accept
some performance loss during the initial transient period, one can implement
the periodic solution rather than the time-varying solution. This offers computational advantage as the periodic filter gains can be computed off-line and
stored.
Periodic solutions of {P (t)} and {K (t)} can be computed in two different ways.
One can iterate on the RDE (6.27) until it converges to a periodic solution. With {P (t)}, one can obtain {K (t)} according to (6.26).
The periodic system can be expressed equivalently as the following lifted
system:
uk (0)
uk (1)
..
.
uk (N 1)
uk (0)
uk (1)
= xk (0) + D
..
.
yk (N 1)
uk (N 1)
yk (0)
yk (1)
..
.
(6.28)
t = 0, , N 1
12
6.4
6.4.1
For state estimation of system (6.1), one can consider solving the following
least squares problem at each time:
Jk =
min
"
k
X
i=1
xe (1) = x(1) x
(1|0)
i = 1, , k
(6.29)
The first term xe (1) represents the error in the initial state estimate. The
second term 1 (i) represents the error in the state transition as predicted by
x(i + 1) = Ax(i) + Bu(i). Finally, 2 (i) represents the error in the output
prediction given by y(i) = Cx(i). Since we are penalizing these errors in the
objective function, the weighting matrices should reflect our relative confidence
in the initial estimate, the state transition equation and the output prediction
equation. The more confident, the higher the weighting. Use of other norms
in the objective function are always possible, but the 2-norm formulation gives
some advantages in terms of implementation and will also turn out to be the
most convenient choice for our discussion.
Notation Jk will be used to denote the least squares problem as well as the
optimal cost. With solutions to Jk (denoted hereafter by x
e (1|k), 1 (i|k), 2 (i|k)
for the consistency with previous notations), we can construct a smoothed estimate for the entire state sequence:
x
(1|k) = x
(1|0) + x
e (1|k)
x
(i + 1|k) = A
x(i|k) + Bu(i) + 1 (i|k),
i = 1, , k
(6.30)
(6.31)
In addition, one can also add constraints on the estimated error sequence:
1 (i) E1 and 2 (i) E2
(6.32)
13
page
(6.33)
one would end up with a nonlinear least squares problem, which is computationally more difficult to solve.
Note that, even though the method is cast in a deterministic least squares
setting, the design is not made any easier as one must still choose the weighting
matrices. The problem of specifying the weighting matrices is essentially the
same as that of specifying the covariance matrices in the Kalman filter design.
In fact, we will be able to relate the weighting matrices to the noise covariance
matrices explicitly when we later establish an equivalence between the least
squares estimation method and the Kalman filter.
Of course, the above-mentioned advantages do not come for free; in order
to realize the benefits, one must be willing to accept higher computational
cost.
At first glance, the computational effort for the least squares problem seems
much higher than the Kalman filter, even in the unconstrained linear case; in
fact, it appears to grow unbounded with time. For this case, however, we
will show that we can use dynamic programming to derive a recursive formula
for x
(k + 1|k) for the above least squares problem. This, in fact, reduces
solving the least squares problem to the Kalman filter calculation. With this
approach, however, one loses the advantage of having access to a smoothed
state sequence. To retain the advantage, one can employ a fixed-size moving
window, within which the least squares estimation is performed. One can add
an appropriate penalty term on the initial state so that the solution within the
window obtained in this manner matches that of the full problem exactly. This
does translate into a slight increase in computation and storage. The estimate
of the initial point and the associated weighting matrix can be adjusted such
that solutions coincide with those for the full batch problem.
For the constrained linear problem or the nonlinear problem, the dynamic
programming approach does not work as the optimization does not yield an
analytical solution. The only option is to employ a moving window and solve
the constrained and/or nonlinear least squares problem directly within the
window. This, of course, means a significant leap in the computational requirement since mathematical programming techniques must be employed for
solution. In addition, there is no way, in general, to choose the penalty term
14
on the initial state such that the estimate obtained with the moving window
coincides exactly with that of the full problem.
The use of moving estimation window and the related issues will be discussed in more details later on.
6.4.2
Note that the above equations are equivalent to those for the Kalman filter
if we set Q0 (k) = P 1 (k), Q1 = R11 , and Q2 = R21 . It implies that x
(k + 1|k)
constructed from the least squares estimation is the same as the estimate from
the Kalman filter given that the weighting matrices Q0 (1), Q1 and Q2 are
chosen as the inverses of the corresponding covariance matrices, P 1 (1), R11
and R21 , respectively.
The rest of section 6.4.2 will be devoted to deriving the above result. The
derivation is based on dynamic programming and the ensuing algebra is somewhat involved. The readers who are not interested in it may the rest of the
section without loss of continuity.
For the notational convenience, let us ignore the term involving deterministic input u. Since these are known terms, dropping them does not affect
our analysis in any way.
Dynamic Programming and Arrival Cost
We now consider the possibility of calculating the estimate x
(k + 1|k) recursively by solving the least squares problem of (6.29) via dynamic programming.
First, note that (6.29) can be re-formulated as
(1|0))T Q0 (1)(x(1) x
(1|0))
Jk = minx(1),x(2),,x(k+1) (x(1) x
Pk
T
+ i=1 (x(i + 1) Ax(i)) Q1 (x(i + 1) Ax(i)) + (y(i) Cx(i))T Q2 (y(i) Cx(i))
(6.36)
15
page
According to the above definition,
Pk
T
+ i=1 (x(i + 1) Ax(i)) Q1 (x(i + 1) Ax(i)) + (y(i) Cx(i))T Q2 (y(i) Cx(i))
(6.37)
(1|0))T Q0 (1)(x(1) x
(1|0)) + (x(2) Ax(1))T Q1 (x(2) Ax(1))
1 (x(2)) = min (x(1) x
x(1)
1 (x(2)) is the arrival cost for x(2); it represents the minimum cost incurred
to arrive at a given x(2). Note that we now can rewrite (6.36) as
h
P
minx(2),,x(k+1) 1 (x(2)) + ki=2 (x(i + 1) Ax(i))T Q1 (x(i + 1) Ax(i))
(6.40)
to obtain x
(k + 1|k). The question we have at this point is whether we can
derive a recursive equation for i (x(i + 1)) at each stage. With this, we should
also be to generate x
(i + 1|i) in a recursive manner. The answer is affirmative
as we will show next.
Recursive Calculation of the Arrival Cost and One-Step-Ahead Prediction
It is useful to introduce the following lemma:
16
(6.43)
x = H 1 g
(6.44)
Then,
and
xT Hx 2g T x = (x x )T H(x x ) gT H 1 g
= (x x )T H(x x ) xT Hx 2g T x x=x
(6.45)
(6.44) can be obtained by setting the derivative of the objective function with
respect to x to zero. The first equation of (eq:lscost-a) can be proved by
substituting (6.44) for x in the right-hand-side of the equation and showing
that it reduces to the left-hand side. The second equation can be proved by
substituting (6.44 into the objective function and showing that it reduces to
g T H 1 g.
Let us now show that a closed-form expression for the arrival cost can be
derived for the first stage. For the first stage, we have to solve
(6.46)
(6.47)
x(1)
(6.48)
Using Lemma 1,
1 (x(2)) = (Q0 (1)
x(1|0) + C T Q2 y(1) + AT Q1 x(2))T (Q0 (1) + C T Q2 C + AT Q1 A)1
(Q0 (1)
x(1|0) + C T Q2 y(1) + AT Q1 x(2)) + xT (2)Q1 x(2)
(6.49)
Let
x
(2|1) = arg min 1 (x(2))
x(2)
(6.50)
(6.51)
17
page
This is consistent with the notations we have been using previously. Then,
from Lemma 1,
1
Q1 A
x
(2|1) = Q1 A(Q0 (1) + C T Q2 C + AT Q1 A)1 AT Q1 + Q1
T
T
1
T
(Q0 (1) + C Q2 C + A Q1 A)
Q0 (1)
x(1|0) + C Q2 y(1) (6.52)
and
1 (x(2)) = (x(2) x
(2|1))T Q1 A(Q0 (1) + C T Q2 C + AT Q1 A)1 AT Q1 + Q1 (x(2) x
(2|1))
|
{z
}
Q0 (2)
+1 (
x(2|1)) + other terms independent of x(2)
(6.53)
(2|1))T Q0 (2)(x(2) x
(2|1))
2 (x(3)) = min (x(2) x
x(2)
+1 (
x(2|1))] + other terms independent of x(2) (6.54)
1
x
(j + 1|j) =
Q1 A(Q0 (j) + C T Q2 C + AT Q1 A)1AT Q1 + Q1
(6.57)
Remarks:
Eqns. (6.56)(6.57) represent a way to construct x
(j + 1|j) recursively.
The expression for the arrival cost j (x(j + 1)) includes the constant
term j (
x(j + 1|j)). Even though this term is carried over to the next
stage in the expression for j+1 (x(j + 2)), it does not affect the optimal
solution x
(j + 2|j + 1) and therefore can be ignored.
18
(6.58)
The above lemma can be easily proved by multiplying both sides of the equation by (A + BCD) and showing that the right-hand side indeed reduces to
identity. Proof is left as an exercise.
Let us first show the equivalence between (6.57) and (6.35). For this, we
first invert both sides of (6.57) to obtain
1
T
T
1 T
(6.59)
Q1
0 (j + 1) = Q1 A(Q0 (j) + C Q2 C + A Q1 A) A Q1 + Q1
We then apply the Matrix Inversion Lemma to the right side of the above
equation with
A
B
C
D
=
=
=
=
Q1
Q1 A
(Q0 (j) + C T Q2 C + AT Q1 A)1
A T Q1
(6.60)
We then obtain
1
Q1
A1 B(C 1 + DA1 B)1 DA1
0 (j + 1) = A
1 T
1
1
T
T
T
A
= Q1
1 Q1 Q1 A Q0 (j) C Q2 C A Q1 A + A Q1 Q1 Q1 A
1
1
= Q1 + A Q0 (j) + C T Q2 C
AT
(6.61)
1
Applying the Matrix Inversion Lemma once more to Q0 (j) + C T Q2 C
, we
get
T
1
1
1
1
1
1
T 1
T
Q1
0 (j+1) = Q1 +A Q0 (j) Q0 (j)C (Q2 + CQ0 (j)C ) CQ0 (j) A
(6.62)
which is (6.35).
Let us next show the equivalence between (6.56) and (6.34).
x
(j + 1|j) =
Q0 (j)
x(j|j 1) + C T Q2 y(j)
(6.63)
(6.64)
page
19
1 T
T
= Q1
A ,
Q1 A(Q0 (j) + C T Q2 C + AT Q1 A)1 AT Q1 + Q1
1 +A Q0 (j) + C Q2 C
(6.65)
we obtain
n
1 T o
T
x
(j + 1|j) =
Q1
+
A
Q
(j)
+
C
Q
C
A Q1 A(Q0 (j) + C T Q2 C + AT Q1 A)1
0
2
1
Q0 (j)
x(j|j 1) + C T Q2 y(j)
n
o
1 T
=
A + A Q0 (j) + C T Q2 C
A Q1 A (Q0 (j) + C T Q2 C + AT Q1 A)1
Q0 (j)
x(j|j 1) + C T Q2 y(j)
Q0 (j)
x(j|j 1) + C T Q2 y(j)
1
1
1
T
T 1
= A Q0 (j) Q0 (j)C (Q2 + CQ1
x(j|j 1)
0 (j)C ) CQ0 (j) Q0 (j)
1
1
T
T 1
= A AQ1
(j|j 1)
0 (j)C (Q2 + CQ0 (j)C ) C x
(6.67)
and
A(Q0 (j) + C T Q2 C)1 C T Q2 y(j)
T
1
1
1
1
T
T 1
= A Q1
0 (j) Q0 (j)C (Q2 + CQ0 (j)C ) CQ0 (j) C Q2 y(j)
1
1
T
T 1
T
= AQ1
I (Q1
Q2 y(j)
0 (j)C
2 + CQ0 (j)C ) CQ0 (j)C
1
1
1
1
1
T
T 1
T
Q2 y(j)
(Q2 + CQ0 (j)C T ) CQ1
= AQ0 (j)C (Q2 + CQ0 (j)C )
0 (j)C
1
1
T
T 1
= AQ1
0 (j)C (Q2 + CQ0 (j)C ) y(j)
In the both derivations, the first step requires use of Matrix Inversion Lemma
and the rest are straightforward algebra. Putting the two together gives equation (6.34).
The equivalence between the unconstrained least squares estimator and the
Kalman filter is relevant in several ways. First, it gives a physically meaningful
way to choose the weighting matrices for the least squares estimator. Second, it
points to the fact that simply minimizing the residual of a deterministic model
in the least squares estimation would amount to using a Kalman filter designed
by adding some artificial white noise terms added to the state and output
equations of the deterministic model, which would perform poorly in most
situations. Even though the least squares estimation is cast in a completely
deterministic setting, disturbance modeling is just as important here as in
the Kalman filter design. Third, it provides some insight into when the least
squares estimator will perform well and when it wont. Specifically, it shows
(6.68)
20
that the Gaussian noise assumption is inherent to the least squares estimator
just as it is to the Kalman filter. One advantage of the least squares estimator
over the Kalman filter, however, is that constraints can be used to alter the
underlying noise statistics.
6.4.3
We mentioned earlier that the full least squares problem must be solved directly when a smoothed estimate of the past state sequence is desired or when
the problem formulation involves constraints and / or a nonlinear model. We
also mentioned the use of a moving estimation window as a way to contain
the size of the least squares problem. Estimation with a fixed-size estimation
window that moves in time will be referred to as Moving Horizon Estimation
(MHE).
Formulation of MHE
Consider the full batch least squares problem of (6.29). Based on the same
forward dynamic programming argument we used earlier, this problem can be
reformulated as
i
P
+ ki=km+1 T1 (i)Q1 1 (i) + T2 (i)Q2 2 (i)
1 (i) = x(k + 1) Ax(i) Bu(i)
2 (i) = y(i) Cx(i)
(6.69)
(6.70)
(6.71)
i=km+1
To be able to use the MHE strategy to solve the full batch problem exactly,
we must be able to compute the arrival cost in some recursive fashion. For
unconstrained linear problems, we have already derived a recursive formula
for the arrival cost (Equations (6.55)(6.57), which are essentially the Kalman
filter update equations). Note that the constant term j (
x(j + 1|j)) in (6.55)
can be dropped since it doesnt affect the solution.
21
page
To the above, constraints of types (6.31) and (6.32) can be added. With
constraints, the least squares problem no longer yields an analytical solution.
If the constraints are formulated as linear inequalities and km (x(k m + 1))
is quadratic, the resulting problem is a QP. With introduction of a nonlinear
model of form (6.33), the resulting problem is an NLP.
For constrained linear systems or unconstrained / constrained nonlinear
systems, there is no way to compute the exact arrival cost in a recursive
manner. In these cases, the only option is to compute the arrival cost approximately. Here, it is better to use an approximate cost that lower-bounds the
exact cost, i.e., use km (x(k m + 1)) in place of km (x(k m + 1)) in
(6.69) such that
km (x(k m + 1)) km (x(k m + 1))
x(k m + 1) X
(6.72)
22
Example 6.4 Show a simple state estimation problem where the constraints
help. Rawlings, Muske, and Lee (Automatica,2001).
Example 6.5 Show a simple parameter estimation example with different
noise distributions. Robertson and Lee (Automatica, 2002)
Examples to Include
1. Demonstrate the difficulty of pole placement design. Use the distillation
column model with a single temperature output?
2. Demonstrate the effectiveness of the Kalman filter on the same problem.
3. Show the importance of disturbance modeling in the Kalman filter design. Compare with the design where a white noise is simply added to
the deterministic model.
4. Demonstrate the various conditions for the existence and uniqueness of
a stabilizing solution of ARE through a simple example. (HOMEWORK
Problem for CHE6400)
5. Examples of Constrained Least Squares estimation and MHE. Demonstrate the use of constraints and the benefit. (Robertson and Lee (Automatica, 2002).)
23
page
Note that (6.1) can be rewritten as
1 (k)
(6.75)
T )
0
R1 R12 R21 R21 0
1 (k)
1 (k)
=
E
0
R2
2 (k)
2 (k)
(6.76)
The presence of the extra input term R12 R21 y(k) does not affect the
filter gain matrix calculation as it is a known term. Hence, the same
formulae can be used to design the optimal filter for the case when the
independence assumption is not satisfied.
2. Show that, for an observable systems, observer poles can be placed at
arbitrary locations.
3. Derive the form of system matrices for the lifted system (6.28) in terms
of the time-varying system matrices.
4. Prove Lemma 1.
5. Prove the Matrix Inversion Lemma.
6. Show that the batch least squares problem yields the maximum a posteriori estimate of the state sequence, which corresponds to the maximum
of the conditional density of the state sequence given the measurements.
Bibliography
1. Observer theory. Luenberger, etc.
2. Pole placement design can be found in most textbooks for linear systems,
including????
3. Kalman filter was first presented in Kalmans original paper(???). DISCUSS! Extensions to continuous time systems can be found in (Kwakernaak and Sivan?????? A good overview of the variations of the Kalman
filter such as the extended Kalman filter (EKF) for nonlinear systems is
given in Jazwinski (????).
4. Least Squares estimation and connection with the Kalman filter was
shown by .... Also, the connection with maximum a posteriori estimation
for Markov systems is discussed in...... A good overview of both can be
found in Jazwinski (?????).
24
25
page
Items to Be Moved
Kalman Filter As The Optimal Bayesian Estimator For Gaussian Systems
THIS SECTION WILL BE MOVED TO THE APPENDIX!
In the previous section, we assumed a linear observer structure and posed
the problem as a parametric optimization where the expected value of the
estimation error variance is minimized with respect to the observer gain. In
fact, the Kalman filter can be derived from an entirely probabilistic argument,
i.e., by deriving a Bayesian estimator that recursively computes the conditional
density of x(k).
Assume that 1 (k) and 2 (k) are Gaussian noise sequences. Then, assuming
x(0) is also a Gaussian variable, x(k) and y(k) are jointly-Gaussian sequences.
Now we can simply formulate the state estimation problem as computing the
conditional expectation E{x(k) | Y (k)} where Y (k) = [y T (1), , y T (k)]T .
Let us denote E{x(i) | Y (j)} as x(i|j). We divide the estimation into the
following two steps.
Model Update: Compute E{x(k)|Y (k 1)} given E{x(k 1)|Y (k 1)}
and u(k 1).
Since x(k) = Ax(k1)+Bu(k1)+1 (k1) and 1 (k1) is a zero-mean
variable independent of y(0), , Y (k 1),
x
(k|k 1) = E {Ax(k 1) + Bu(k 1) + 1 (k 1) | Y (k 1)}
= AE{x(k 1) | Y (k 1)} + Bu(k 1)
(6.77)
Hence, we obtain
x
(k|k 1) = A
x(k 1|k 1) + Bu(k 1)
(6.78)
(6.79)
Therefore,
o
n
P (k) = E (x(k) x
(k|k 1)) (x(k) x
(k|k 1))T
= AP (k 1)AT + R1
(6.80)
26
T )
x(k) x
(k|k 1)
P (k)
P (k)C T
x(k) x
(k|k 1)
E
=
y(k) y(k|k 1)
CP (k) CP (k)C T + R2
y(k) y(k|k 1)
(6.81)
Now, recall the earlier results for jointly Gaussian variables:
E{x|y} = E{x} + Rxy Ry1 (y E{y})
Cov{x|y} = Rx
(6.82)
(6.83)
(y(k) C x
(k|k 1))
= x
(k|k 1) + P (k)C T CP (k)C T + R2
(6.84)
P (k) = Cov{x(k)|y(k)}
1
= P (k) P (k)C T CP (k)C T + R2
CP (k)
(6.85)
In short, for Gaussian systems, we can compute the conditional mean and
covariance of x(k) recursively using
x
(k|k 1) = A
x(k 1|k 1) + Bu(k 1)
(6.86)
1
(y(k) C x
(k|k(6.87)
1))
x
(k|k) = x
(k|k 1) + P (k)C T CP (k)C T + R2
{z
}
|
K(k)
and
P (k) = AP (k 1)AT + R1
1
P (k) = P (k) P (k)C T CP (k)C T + R2
CP (k)
(6.88)
(6.89)
Note that this above has a linear estimator structure with the estimator gain
given by the Kalman filter equations.
Chapter 7
RANDOM VARIABLES
INTRODUCTION
28
What Is Statistics?
Statistics deals with the application of probability theory to real problems.
There are two basic problems in statistics.
Given a probabilistic model, predict the outcome of future trial(s). For
instance one may say:
choose the prediction x
such that expected value of (x x
)2 is
minimized.
Given collected data, define / improve a probabilistic model.
For instance, there may be some unknown parameters (say ) in the
probabilistic model. Then, given data X generated from the particular
probabilistic model, one should construct an estimate of in the form
of (X).
For example, (X)
may be constructed based on the objective
2.
of minimizing expected value of k k
2
Another related topic is hypothesis testing, which has to do with testing
whether a given hypothesis is correct (i.e, how correct defined in terms
of probability), based on available data.
In fact, one does both. That is, as data come in, one may continue to
improve the probabilistic model and use the updated model for further prediction.
A priori Knowledge
Error
feedback
Predictor
PROBABILISTIC
MODEL
ACTUAL
SYSTEM
7.1.2
X
+
29
F( ;d)
P(;d)
Note that
P(; d)d =
dF (; d) = 1
(7.3)
In addition,
Z
b
a
P(; d) d =
b
a
)
1
1 m 2
P(; d) =
exp
2
2 2
(7.5)
30
P( ;d)
m-
68.3%
m +
T
Let d = d1 dn
be a continuous random variable vector(d Rn ).
Now we must quantify the distribution of its individual elements as well as
their correlations.
Joint Probability Distribution Function
The joint probability distribution function F (1 , , n ; d1 , , dn ) for
random variable vector d is defined as
F (1 , , n ; d1 , , dn ) = P r{d1 1 , , dn n }
(7.6)
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
3
2
2
0
1
0
2
3
31
,,
P(1 , , n ; d1 , , dn )d1 dn = 1
(7.9)
(7.11)
(7.12)
If
d1 , , dn are called mutually independent.
Example: Guassian or Jointly Normally Distributed Variables
Suppose that d = [d1 d2 ]T is a Gaussian variable. The density takes the form
of
"
(
1 m 1 2
1
1
P(1 , 2 ; d1 , d2 ) =
exp
2(1 2 )
1
2 1 2 (1 2 )1/2
2 #)
(1 m1 )(2 m2 )
2 m 2
2
+
(7.13)
1 2
2
32
(
)
1
1 1 m 1 2
= p
exp
2
1
212
P(2 ; d2 ) =
=
P(1 , 2 ; d1 , d2 ) d1
(
)
1
1 2 m 2 2
p
exp
2
2
222
(7.14)
(7.15)
(7.16)
(7.17)
Hence, (m1 , 1 ) and (m2 , 2 ) represent parameters for the marginal density of
d1 and d2 respectively. Note also that
P(1 , 2 ; d1 , d2 ) 6= P(1 ; d1 )P(2 ; d2 )
(7.18)
except when = 0.
General n-dimensional Gaussian random variable vector d = [d1 , , dn ]T
has the density function of the following form:
P(; d) = P(1 , , n ; d1 , , dn )
1
1
T 1
=
exp ( d) Pd ( d)
n
2
(2) 2 |Pd |1/2
(7.19)
(7.20)
E{f (d)} =
f ()P(; d) d
(7.21)
33
d = E{d} =
P(; d) d
(7.22)
2} =
Var{d} = E{(d d)
2 P(; d) d
( d)
(7.23)
1
P(; d) =
exp
2
2 2
2 )
(7.24)
2 )
Z
1
m
1
exp
d = m
(7.25)
d = E{d} =
2
2 2
2 )
Z
1
m
1
2
2
}=
exp
d = 2
Var{d} = E{(d d)
( m)
2
2
(7.26)
2
Hence, m and that parametrize the normal density represent the mean and
the variance of the Gaussian variable.
EXPECTATION OF RANDOM VARIABLES AND RANDOM VARIABLE FUNCTIONS: VECTOR CASE
We can extend the concepts of mean and variance similarly to the vector case.
Let d be a random variable vector that belongs to Rn .
Z
` P(` ; d` ) d`
(7.27)
d` = E{d` } =
Z
Z
=
` P(1 , , n ; d1 , , dn ) d1 , , dn
Z
2
Var{d` } = E{(d` d` ) } =
(` d` )2 P(` ; d` ) d`
(7.28)
Z
Z
=
(` d` )2 P(1 , , n ; d1 , , dn ) 1 , , dn
34
In the vector case, we also need to quantify the correlations among different
elements.
Cov{d` , dm } = E{(d` d` )(dm dm )}
(7.29)
Z
Z
(` d` )(m dm )P(1 , , n ; d1 , , dn ) d1 , , dn
Note that
Cov{d` , d` } = Var{d` }
The ratio
= p
Cov{d` , dm }
Var{d` }Var{dm }
(7.30)
(7.31)
d)
T}
Cov{d} = E{(d d)(d
(7.32)
Z
Z
d)
T P(1 , , n ; d1 , , dn ) d1 , , dn
( d)(
The (i, j)th element of Cov{d} is Cov{di , dj }. The diagonal elements of Cov{d}
are variances of elements of d. The above matrix is symmetric since
Cov{di , dj } = Cov{dj , di }
(7.33)
(7.35)
1
1
1 m 1 2
exp
(7.36)
P(; d) =
2(1 2 )
1
2 1 2 (1 2 )1/2
#)
(1 m1 )(2 m2 )
2 m 2 2
2
+
1 2
2
Then,
E{d} =
=
m2
m2
1
2
P(; d) d1 d2
(7.37)
35
Z Z
1 m 1
(1 m1 ) (2 m2 ) P(; d) d1 d2
Cov{d} =
2 m 2
2
1
1 2
=
(7.38)
1 2
22
Example: Gaussian Variables n-Dimensional Case
Let d = [d1 dn ]T and
1
1
T 1
exp ( d) Pd ( d)
P(; d) =
n
2
(2) 2 |Pd |1/2
(7.39)
E{d} =
Z
Z
d)
T P(; d) d1 , , dn = Pd(7.41)
Cov{d} =
( d)(
Hence, d and Pd that parametrize the normal density function P(; d) represent
the mean and the covariance matrix.
Exercise: Verify that, with
2
m
1
1
2
1
d =
; Pd =
m2
1 2
22
(7.42)
one obtains the expression for normal density of a 2-dimensional vector shown
earlier.
NOTE: Use of SVD for Visualization of Normal Density
Covariance matrix Pd contains information about the spread (i.e., extent of
deviation from the mean) for each element and their correlations. For instance,
Var{d` } = [Cov{d}]`,`
{d` , dm } =
(7.43)
[Cov{d}]`,m
q
[Cov{d}]`,` [Cov{d}]m,m
(7.44)
where []i,j represents the (i, j)th element of the matrix. However, one still has
hard time understanding the correlations among all the elements and visualizing the overall shape of the density function. Here, the SVD can be useful.
Because Pd is a symmetric matrix, it has the following SVD:
d)
T}
Pd = E{(d d)(d
= V V
v1
(7.45)
vn
(7.46)
1
..
v1T
..
.
vnT
n
(7.47)
36
1
..
d)
TV } =
E{V T (d d)(d
.
n
(7.48)
E{(d d )(d d )T } =
1
..
.
n
(7.49)
20.2 19.8
19.8 20.2
"
2
2
2
2
2
2
22
10 0
0 0.1
"
2
2
2
2
2
2
2
2
(7.50)
37
=
=
R +
lim0 P(, ; x, y)d
Z Z +
P(, ; x, y)d d
|
{z
}
normalization factor
P(, ; x, y)
R
P(, ; x, y)d
P(, ; x, y)
P(, y)
(7.51)
(7.52)
(7.53)
Note:
The above means
(7.54)
P(|; x|y) d = 1
(7.55)
(7.56)
(7.57)
if and only if
This means that the conditional density is same as the marginal density
when and only when x and y are independent.
We are interested in the conditional density, because often some of the
random variables are measured while others are not. For a particular trial,
if x is not measurable, but y is, we are intersted in knowing P(|; x|y) for
estimation of x.
Finally, note the distinctions among different density functions:
38
b2
a2
b1
a1
P(, ; x, y)
P(, y)
(7.61)
P(, ; x, y)
P(, x)
(7.62)
Bayes Rule:
Note that
P(|; x|y) =
P(|; y|x) =
P(, ; x, y)
P(, y)
P(, ; x, y)
P(, x)
(7.63)
(7.64)
Hence, we arrive at
P(|; x|y) =
P(|; y|x)P(, x)
P(, y)
(7.65)
(7.66)
(7.67)
39
Bayes Rule is useful, since in many cases, we are trying to compute P(|; x|y)
and its difficult to obtain the expression for it directly, while it may be easy
to write down the expression for P(|; y|x).
We can define the concepts of conditional expectation and conditional covariance using the conditional density. For instance, the conditional expectation of x given y = is defined as
Z
E{x|y} =
P(|; x|y)d
(7.68)
(7.69)
(7.70)
#)
1
x
2
( x
)( y)
y 2
exp
2
+
2(1 2 )
x
x y
y
P(, ; x, y) =
2 x
1 y 2
q
(7.72)
exp
2
y
2y2
|
{z
}
marginal density of y
!2
x
)
1
1
p
p y
exp
2
2x2 (1 2 )
x 1 2
|
{z
}
conditional density of x
(
)
1
1 x
2
p
exp
(7.73)
2
x
2x2
{z
}
|
marginal density of x
!2
y
1
1
p x
q
exp
2
y 1 2
2y2 (1 2 )
{z
}
|
conditional density of y
1
40
Hence,
P(|; x|y) =
P(|; y|x) =
!2
)
1
1
p
p y
(7.74)
exp
2
2x2 (1 2 )
x 1 2
!2
y
1
1
p x
q
exp
(7.75)
2
y 1 2
2 2 (1 2 )
y
Note that the above conditional densities are normal. For instance, P(|; x|y)
is a normal density with mean of x
+ xy ( y) and variance of x2 (1 2 ).
So,
x
E{x|y} = x
+ ( y)
(7.76)
y
x y
( y)
(7.77)
= x
+
y2
= E{x} + Cov{x, y}Var1 {y}( y)
(7.78)
(7.79)
(7.80)
1
(x y )
y2
(7.81)
(7.83)
41
..
E{x|y} =
. P(|; x|y) d1 , , dn
n
Z
(7.85)
T
1 E{x1 |y}
1 E{x1 |y}
..
..
Cov{x|y} =
P(|; x|y) d1 , , dn
.
.
n E{xn |y}
n E{xn |y}
(7.86)
Z
x
y
(7.87)
1
(2)
n+m
2
|Pz |1/2
1
exp ( z)T Pz1 ( z)
2
(7.88)
where
z =
Pz =
; =
(7.89)
Cov(x) Cov(x, y)
Cov(y, x) Cov(y)
(7.90)
(7.91)
(x)( x
)
(7.92)
o
n
(7.93)
and
(7.94)
(7.95)
(7.96)
42
7.1.3
STATISTICS
PREDICTION
The first problem of statistics is prediction of the outcome of a future trial
given a probabilistic model.
Suppose P(x), the probability density for random variable x, is
given. Predict the outcome of x for a new trial (which is about to
occur).
Note that, unless P(x) is a point distribution, x cannot be predicted exactly.
2
x
= arg min E kx x
k2
x
If a related variable y (from the same trial) is given, then one should use
x
= E{x|y} instead.
SAMPLE MEAN AND COVARIANCE, PROBABILISTIC MODEL
The other problem of statistics is inferring a probabilistic model from collected
data. The simplest of such problems is the following:
We are given the data for random variable x from N trials. These
data are labeled as x(1), , x(N ). Find the probability density
function for x.
Often times, a certain density shape (like normal distribution) is assumed to
make it a well-posed problem. If a normal density is assumed, the following
sample averages can then be used as estimates for the mean and covariance:
=
x
N
1 X
x(i)
N
i=1
N
1 X
Rx =
x(i)xT (i)
N
i=1
43
Note that the above estimates are consistent estimates of real mean and covariance x
and Rx (i.e., they converge to true values as N ).
A slightly more general problem is:
7.2
STOCHASTIC PROCESSES
A stochastic process refers to a family of random variables indexed by a parameter set. This parameter set can be continuous or discrete. Since we are
interested in discrete systems, we will limit our discussion to processes with
a discrete parameter set. Hence, a stochastic process in our context is a time
sequence of random variables.
7.2.1
DISTRIBUTION FUNCTION
Let x(k) be a sequence. Then, (x(k1 ), , x(k` )) form an `-dimensional random variable. Then, one can define the finite dimensional distribution function and the density function as before. For instance, the distribution function
F (1 , , ` ; x(k1 ), , x(k` )), is defined as:
F (1 , , ` ; x(k1 ), , x(k` )) = Pr{x(k1 ) 1 , , x(k` ) ` }
(7.97)
44
x
(k) = E{x(k)} =
Its covariance is defined as
dF (; x(k))
(7.98)
Rx (k1 , k2 ) = E{[x(k
(k1 )][x(k2 ) x
(k2 )]T }
R R 1 ) x
= [1 x
(k1 )][2 x
(k2 )]T dF (1 , 2 ; x(k1 ), x(k2 ))
(7.99)
The cross-covariance of two stochastic processes x(k) and y(k) are defined as
Rxy (k1 , k2 ) = E{[x(k
(k1 )][y(k2 ) y(k2 )]T }
R R 1 ) x
(k1 )][2 y(k2 )]T dF (1 , 2 ; x(k1 ), y(k2 ))
= [1 x
(7.100)
Gaussian processes refer to the processes of which any finite-dimensional distribution function is normal. Gaussian processes are completely characterized
by the mean and covariance.
STATIONARY STOCHASTIC PROCESSES
Throughout this book we will define stationary stochastic processes as those
with time-invariant distribution function. Weakly stationary (or stationary
in a wide sense) processes are processes whose first two moments are timeinvariant. Hence, for a weakly stationary process x(k),
E{x(k)} = x
k
T
E{[x(k) x
][x(k ) x
] } = Rx ( ) k
(7.101)
In other words, if x(k) is stationary, it has a constant mean value and its
covariance depends only on the time difference . For Gaussian processes,
weakly stationary processes are also stationary.
For scalar x(k), R(0) can be interpreted as the variance of the signal and
)
reveals its time correlation. The normalized covariance R(
R(0) ranges from
0 to 1 and indicates the time correlation of the signal. The value of 1 indicates
a complete correlation and the value of 0 indicates no correlation.
R( )
R(0)
Note that many signals have both deterministic and stochastic components. In some applications, it is very useful to treat these signals in the same
framework. One can do this by defining
P
x
= limN N1 N
k=1 x(k)
(7.102)
1 PN
Rx ( ) = limN N k=1 [x(k) x
][x(k ) x
] T
Note that in the above, both deterministic and stochastic parts are averaged out. The signals for which the above limits converge are called quasistationary signals. The above definitions are consistent with the previous
45
definitions since,in the purely stochastic case, a particular realization of a stationary stochastic process with given mean (
x) and covariance (Rx ( )) should
satisfy the above relationships.
x () =
1 X
Rx ( )ej
2 =
(7.103)
Area under the curve represents the power of the signal for the particular frequency range. For example, the power of x(k) in the frequency range (1 , 2 )
is calculated by the integral
2
=2
x ()d
=1
Rx ( ) =
x ()ej d
(7.104)
(7.105)
E{x(k)x(k) } = Rx (0) =
x ()d
which indicates that the total area under the spectral density is equal to the
variance of the signal. This is known as the Parsevals relationship.
Example: Show plots of various covariances, spectra and realizations!
**Exercise: Plot the spectra of (1) white noise, (2) sinusoids, and (3)white
noise filtered through a low-pass filter.
46
Rx if = 0
E{(x(k) x
)(x(k ) x
) T } =
0
if 6= 0
(7.107)
d(k) = H(q)(k) + d
(7.109)
(7.110)
47
arbitrary d and has no pole or zero outside the unit disk. In other words,
the first and second order moments of any stationary signal can be matched
by the above model.
This result is very useful in modeling disturbances whose covariance functions are known or fixed. Note that a stationary Gaussian process is completely
specified by its mean and covariance. Such a process can be modelled by filtering a zero-mean Gaussian white sequence through appropriate dynamics
determined by its spectrum (plus adding a bias at the output if the mean is
not zero).
P()
y(k)
90%
10%
48
~ 1
H (q )
1 (k)
1 q
(k)
y(k)
H (q1 )
1
1 q
y(k)
(k)
1
1 q1
H(q1)
y(k)
(k)
H(q1)
y(k)
(7.111)
(7.112)
If all the eigenvalues of A are strictly inside the unit disk, the above approaches a stationary process as k since
limk E{x(k)} = 0
limk E{x(k)xT (k)} = Rx
(7.113)
(7.114)
49
CRx C T + DR DT for = 0
T
Ry ( ) = E{y(k + )y (k)} =
CA Rx C T + CA 1 BR DT for > 0
(7.116)
The spectrum of w is obtained by taking the Fourier transform of Ry ( )) and
can be shown to be
T
(7.117)
y () = C(ej I A)1 B + D R C(ej I A)1 B + D
In the case that A contains eigenvalues on or outside the unit circle, the
process is nonstationary as its covariance keeps increasing (see Eqn. (7.112).
However, it is common to include integrators in A to model mean-shifting
(random-walk-like) behavior. If all the outputs exhibit this behavior, one can
use
x(k + 1) = Ax(k) + B(k)
(7.118)
y(k) = Cx(k) + D(k)
Note that, with a stable A, while y(k) is a stationary process, y(k) includes
an integrator and therefore is nonstationary.
Stationary process
()
x(k+1)=Ax(k)+B (k)
y(k)=Cx(k)+D (k)
stable system
Nonstationary(Meanshifting) process
y(k)
()
x(k+1)=Ax(k)+B (k)
y(k)=Cx(k)+D (k)
stable system
y(k)
1
1-q -1
integrator
y(k)
MODEL PREDICTIVE
CONTROL
Manfred Morari
Jay H. Lee
Carlos E. Garca
April 4, 2002
Chapter 6
Unconstrained Quadratic
Optimal Control
In this chapter, we present the basic results in linear quadratic optimal control.
We will derive optimal and suboptimal control policies for both finite horizon
and infinite horizon problems. We will consider the unconstrained case in
this chapter and then extend the results to the constrained case in the next
chapter.
The standard problem of quadratic optimal control is that of regulating
the state at the origin (or driving the state to the origin starting from some
nonzero initial condition). For linear system
x(k + 1) = Ax(k) + Bu(k)
(6.1)
with a given initial condition x(0), the objective is to find an input sequence
or a feedback policy that minimizes the quadratic cost function of
Vp =
p1
X
k=0
(6.2)
6.1
min
u(0),,u(p1)
Vp
(6.3)
(6.4)
x(0) = x0
(6.5)
6.1.1
The future state sequence can be related to the future input sequence through
the following linear equation:
0
... ... 0
x(0)
I
u(0)
x(1) A
B
0 ... 0
..
.. ..
.
.
. . . . . ...
(6.6)
. = . x(0) + AB
..
.. ..
..
..
.. ..
.
. .
.
. .
.
u(p 1)
x(p)
Ap
Ap1 B . . . . . . B
Denote the above as
X = S x x(0) + S u U
(6.7)
Vp = X T QX
(6.8)
= blockdiag{Q, , Q, Qt } and R
= blockdiag{R, , R}. Substitutwhere Q
ing (6.7) into the objective function (6.8) yields
(S x x(0) + S u U) + U T RU
Vp = (S x x(0) + S u U)T Q
T
xT u
uT u
T
gT
(6.9)
By applying Lemma ?? of Chapter ??, we find that the solution that minimizes Vp is given by
U
= H1 g
x x0
u+R
1 S uT QS
= S uT QS
(6.10)
April 4, 2002
Also, from the same lemma, the optimal cost is
Jp (x0 ) = g Th Hg + xT0 S xT S x x0
i
u S uT QS
u+R
1 S uT QS
x x0
= xT0 S xT S x S xT QS
6.1.2
(6.11)
A more elegant way to solve the same problem is by using dynamic programming. The idea is to recast the problem as a series of one-step optimal control
problem. The following definition is useful:
Definition 1 The optimal cost for the p k step problem
Jk (z) =
min
u(pj),,u(p1)
p1
X
i=pk
for system x(i + 1) = Ax(i) + Bu(i) and the initial condition x(p j) = z is
called k-step cost-to-go for z. It is the minimum cost incurred to solve the
k-step optimal control problem starting from the state z.
Because of the particular structure of the state-space system, the optimal
decision for u(p 1) depends only on x(p 1) not on x(p 2), x(p 3),
and so on. Based on this, we start with the following one-step-ahead problem
posed at time p 1:
J1 (x(p 1)) = min xT (p)S(p)x(p) + xT (p 1)Qx(p 1) + uT (p 1)Ru(p 1)
u(p1)
(6.12)
x(p) = Ax(p 1) + Bu(p 1)
S(p) = Qt
(6.13)
(6.15)
(6.16)
L(p1)
where
S(p 1) = AT S(p)A + Q AT S(p)B(B T S(p)B + R)1 B T S(p)A
(6.17)
At the next stage, consider solving the two-step-ahead problem posed at time
p 2:
J2 (x(p 2)) =
min
u(p1),u(p2)
x (p)S(p)x(p) +
p1
X
xT (i)Qx(i) + uT (i)Ru(i)
i=p2
(6.18)
The above is equivalent to the following one-stage problem with the cost-to-go
inherited from the previous stage:
J2 (x(p 2))
(6.20)
L(p2)
(6.21)
where
S(p 2) = AT S(p 1)A + Q AT S(p 1)B(B T S(p 1)B + R)1 B T S(p 1)A
(6.22)
Continuing on with this (i.e., successively solving Jpk (x(k)), k = p 1, , 0
and propagating the cost-to-go), we obtain the following optimal sequence for
the original problem of (6.3):
u(k) = L(k)x(k),
for k = p 1, , 0
(6.23)
where
L(k) = (B T S(k + 1)B + R)1 B T S(k + 1)Ax(k)
(6.24)
and
S(k) = AT S(k + 1)A + Q AT S(k + 1)B(B T S(k + 1)B + R)1 B T S(k + 1)A
(6.25)
The above equation called Discrete Time Riccati Equation or Riccati Difference
Equation is initialized with S(p) = Qt and is solved backward, i.e., starting
with S(p) and solving for S(p 1), etc. The optimal cost for the p-stage
problem is
Jp (x0 ) = xT0 S(0)x0
(6.26)
April 4, 2002
6.1.3
i=k
(6.27)
This also gives the same optimal feedback policy between u(k) and x(k).
Dynamic programming turns a multi-step problem into multiple singlestep problems and the feedback gain (6.23) is determined through a
recursion (6.25) involving matrices of low dimension. On the other hand,
the explicit approach requires the inversion of the Hessian matrix H,
which is of very large dimension if the horizon is large. Special algorithms
may be needed to take advantage of the structure of the Hessian matrix.
When constraints are imposed on the inputs and/or the state, the least
squares problem does not yield an analytical solution and numerical optimization must be employed. In this case, the dynamic programming
approach is no longer feasible. On the other hand, the least squares approach can be made viable by using some numerical optimization strategy to solve the constrained least squares problem Jpk (x(k)) on-line
for a specific value of x(k) at each sample time. From the optimal input
sequence for Jpk (x(k)), one would implement only u(k) and discard the
rest. This strategy is referred to as receding horizon control.
6.2
X
T
T
J (x0 ) = min V =
x (k)Qx(k) + u (k)Ru(k)
(6.28)
u()
k=0
6.2.1
Since the prediction must be carried out to infinity, the calculation of the
optimal input sequence through the explicit least squares method becomes
impossible. On the other hand, derivation of the optimal feedback law through
the dynamic programming approach remains viable. Consider the optimal
feedback solution with p = :
u(k) = (B T S(k + 1)B + R)1 B T S(k + 1)Ax(k),
k = , , 0
(6.29)
The RDE
S(k) = AT S(k + 1)A + Q AT S(k + 1)B(B T S(k + 1)B + R)1 B T S(k + 1)A
(6.30)
is to be initialized with S() = Q and solved backward. Let us assume for the
moment that the iteration of (6.30) converges to a solution S within some
finite number of iteration. Such S would then satisfy the Algebraic Riccati
Equation (ARE) of
S = AT S A + Q AT S B(B T S B + R)1 B T S A
(6.31)
k = 0, ,
(6.32)
Jp (x0 ) = xT0 S x0 .
(6.33)
6.2.2
April 4, 2002
with x0 = x(k):
nP
p1 T
i=0 x (k
+ i)Qx(k + i) + uT (k + i)Ru(k + i)
The solution to the finite horizon problem has already been discussed in
section 6.1. Since only u(k) is implemented from the solution to Jp (x(k)), the
resulting control is a feedback control given by
u(k) = (B T Sp B + R)1 B T Sp A x(k)
{z
}
|
(6.35)
Lp
xT (k+p+j)Qx(k+p+j)+uT (k+p+j)Ru(k+p+j)
j=0
(6.37)
From the discussion of the infinite horizon optimal control policy given
in section 6.2.1, it is clear that we can compute such Qt by solving the
ARE of
Qt = AT Qt A + Q AT Qt B(B T Qt B + R)1 B T Qt A
(6.38)
X
xT (k + i)Qx(k + i)
(6.39)
xT (k + p)Qt x(k + p) =
i=p
Then,
P
T
xT (k + p)Qt x(k + p) = xT (k + p)Qx(k + p) +
i=p+1 x (k + i)Qx(k + i)
T
T
= x (k + p)Qx(k + p) + x (k + p + 1)Qt x(k + p + 1) = xT (k + p)Qx(k + p) + xT (k + p)A
(6.40)
From the above, we can see that Qt can be chosen as a positive semidefinite solution of the Lyapunov equation
AQt AT + Q = Qt
(6.41)
(6.42)
Let us assume again that no control is taken beyond the horizon p but
that the system is unstable. The previous formulation cannot be used
since there is no positive semi-definite solution to the Lyapunov equation
(6.41) in this case. To ensure that the infinite horizon cost is finite under
the open-loop assumption, the unstable modes of the system must be
zeroed at k + p. This can be done either through an explicit terminal
constraint in the optimization or by assigning an infinite weight to them
at k + p.
For this purpose we must first find a coordinate transformation that
separates the unstable modes x
u from the stable modes x
s :
x
s (k)
T1
x
(6.43)
=
x
u (k)
T2
The states x
s and x
u in the transformed coordinates are related to the
original state according to
x
s (k)
x = G1 G2
(6.44)
x
u (k)
April 4, 2002
where the columns of G1 and G2 contain all the eigenvectors and generalized eigenvectors for the stable and unstable
respectively.
eigenvalues,
T
T
Hence, the coordinate transformation
matrix
can
be obtained
1
2
by inverting the matrix G1 G2 . After the coordinate transformation, we obtain the decoupled system
s
x
s (k + 1)
As 0
x
s (k)
B
=
+
u(k)
(6.45)
x
u (k + 1)
x
u (k)
0 Au
Bu
Then one can solve the Lyapunov equation for the stable subsystem only
As P0 ATs + GT1 QG1 = P0
(6.46)
(6.47)
T2 x(k + p) = 0
(6.48)
The constraint
forces the unstable modes to be zero at the end of the finite horizon.
Finally we can imagine the more restrictive terminal constraint that
forces the whole state x(k + p) to be zero at the end of the horizon.
Under this constraint, the infinite horizon problem is equivalent to the
finite horizon problem because the cost from time k + p to is zero.
Note, however, that implementing any one of the mentioned finite receding
horizon control alternatives, except the first, yields a control law that is slightly
different from the solution of J (x(k)).
Of course, when x(k) is not directly measurable, one can construct x
(k|k)
through a state estimator and use it instead of x(k) to find the control u(k).
6.3
6.3.1
Analysis
State Feedback Case
The infinite horizon and the receding horizon policies mentioned above lead
to a linear feedback law u(k) = Lx(k). Implementation on the system
x(k + 1) = Ax(k) + Bu(k)
(6.49)
(6.50)
10
For any particular choice of controller design parameters we can check stability
by computing the eigenvalues of A BL. It is desirable to establish conditions
under which the closed loop system is stable and which do not require the
eigenvalues to be checked. This makes the search for appropriate controller
design parameters much easier.
We will show below that the infinite horizon LQR renders the closed loop
system stable for virtually any design parameters under rather mild assumptions. We will show in the next chapter that, in general, the various receding
horizon control policies proposed above lead to closed loop stable systems as
long as the control objective reflects directly or indirectly (through constraints)
the infinite horizon cost.
For an analysis of the infinite horizon control problem we consider the
optimal estimation problem for the dual system
x(k + 1) = AT x(k) + 1 (k)
y(k) = B T x(k) + 2 (k)
(6.51)
where 1 (k) and 2 (k) are zero-mean white noise sequences with covariances
Q and R respectively. We can verify that the estimator Riccati equation (??)
for the dual system (6.51) is identical to the controller Riccati equation (6.30)
for the original system (6.49). (Since the former runs forward in time and the
latter backward, set P (0) as S()). Also, by comparison we can verify that
the asymptotic optimal estimator gain (??) for the dual system is identical to
the transpose of the asymptotic optimal controller gain (6.32) for the original
system. Moreover, the observer poles of the dual system are the poles of the
closed-loop system because the eigenvalues of (AT LT B T ) are the same as
the eigenvalues of (A BL )).
Based on the duality, we can draw some important properties of the LQR
and the Riccati equation from the analysis of the Kalman filter in section ??.
By applying the result from that discussion to the above dual system, we know
that, if (B T , AT ) is a detectable pair, the observer Riccati difference equation
(??) for the dual system will converge asymptotically upon iteration. This
means, if (A, B) is a stabilizable pair, the control Riccati difference equation
of (6.30) will converge.
1/2
We also know that, if (AT , Q1 ) is a stabilizable pair, the converged solution is a stabilizing solution, which is a unique positive semi-definite solution
1/2
of the observer ARE (??). This means that, if (Q1 , A) is a detectable pair,
the converged solution is a stabilizing solution, which is a unique positive
semi-definite solution of the controller ARE (6.31).
In summary,
If (A, B) is a stabilizable pair and (Q1/2 , A) is a detectable pair, the
controller Riccati difference equation (6.30) with S() 0 con-
11
April 4, 2002
verges to a unique positive (semi)-definite solution of ARE (6.31)
and all the eigenvalues of (A BL ) lie inside the unit disk.
The first condition is clearly necessary for stabilization. The second condition
means that all the unstable modes in the state space should be weighed in the
objective function in a linearly independent manner. This condition ensures
that, if the infinite horizon objective function remains bounded, x 0 and
u 0.
6.3.2
(6.52)
x(k + 1)
A BL
BL
x(k)
=
(6.54)
xe (k + 1)
0
A KCA
xe (k)
Note that the above equation is only one way coupled and the eigenvalues of
the closed-loop systems transition matrix are composed of those of (A BL)
and (A KCA). An important implication is that closed-loop stability of an
observer-based output feedback controller can be guaranteed by designing the
state feedback regulator component and the state estimator component to be
stable on an individual basis. In other words, there is a kind of separation
between the regulator and the observer in terms of stability for unconstrained
linear systems.
6.4
In practice, measurements may be corrupted with random errors. Also, unmeasured disturbances and other random errors will always be present. Hence,
it is of interest to consider the same problem posed for stochastic system
x(k + 1) = Ax(k) + Bu(k) + 1 (k)
y(k) = Cx(k) + 2 (k)
(6.55)
12
6.4.1
(6.56)
1 is a zero-mean white noise sequence with covariance R1 . The state is assumed to be measured fully and exactly.
Open-Loop Optimal Solution Via Least Squares
Let us examine for the above system (with initial condition x(0) = x0 ) the
following open-loop quadratic optimal control problem:
)!
( p1
k=0
(6.57)
As before,
x(0)
x(1)
..
.
..
.
x(p)
I
A
..
.
..
.
Ap
0
I
x(0) +
A
..
.
Ap1
...
0
..
.
..
.
...
...
...
..
.
..
.
...
0
B
... ...
0 ...
.. ..
.
.
AB
..
.. ..
.
.
.
Ap1
B
.
.
.
.
.
.
0
1 (0)
0
..
..
.
.
..
..
.
.
1 (p 1)
I
0
0
..
.
..
.
B
u(0)
..
.
..
.
u(p 1)
(6.58)
(6.59)
Then,
+ U T RU
Vp = E nX T QX
o
(S x x(0) + S u U + S E) + U T RU
= E (S x x(0) + S u U + S E)T Q
(S x x(0) + S u U) + U T RU
+ E E T S T QS
E
= (S x x(0) + S u U)T Q
(6.60)
13
April 4, 2002
Vp = (S x x(0) + S u U)T Q
R
1}
+trace{S T QS
(6.61)
u+R
1 S uT QS
x x0
U = S uT QS
i
h
u S uT QS
u+R
1 S uT QS
x x0
Jpo (x0 ) = xT0 S xT S x S xT QS
R
1}
+trace{S T QS
(6.62)
(6.63)
(6.64)
x(p) = Ax(p 1) + Bu(p 1) + (p 1)
S(p) = Qt
(6.65)
Substituting the state equation in (6.65) into the objective function (6.64), we
obtain
14
The first two terms are same as those for the deterministic system. Furthermore, the last term is trace{S(p)R1 }, which is a constant. Hence, the optimal
solution is same as the deterministic case:
u(p 1) = (B T S(p)B + R)1 B T S(p)A x(p 1)
{z
}
|
(6.67)
L(p1)
(6.68)
(6.69)
where
At the next stage, consider solving the two-step-ahead problem posed at time
p 2:
The above is in the same form as (6.64) and the same argument yields
T
p
X
trace{S(j)R1 }
(6.71)
j=p1
(6.72)
L(p2)
where
S(p 2) = AT S(p 1)A + Q AT S(p 1)B(B T S(p 1)B + R)1 B T S(p 1)A
(6.73)
By induction, we obtain the optimal input sequence for the original problem
of (6.57):
u(k) = (B T S(k + 1)B + R)1 B T S(k + 1)Ax(k),
k = p 1, , 0, (6.74)
where
S(k) = AT S(k + 1)A + Q AT S(k + 1)B(B T S(k + 1)B + R)1 B T S(k + 1)A
(6.75)
with S(p) = Qt . This is the same control law we derived for the deterministic
system. The optimal cost (i.e., the achievable cost for feedback control) is
Jp (x0 ) = xT0 S(0)x0 +
p
X
trace{S(j)R1 }
j=1
(6.76)
15
April 4, 2002
u(k),,u(p1)
( p1
X
i=k
(6.77)
..
.
J1 (x(k + p 1)) =
..
.
min
u(k+p1)
Note that the min operator and the E operator do not commute in general.
In the special case of the linear quadratic problem we consider, however,
the optimal solution for the stochastic system happens to be the same as that
for the deterministic system. In other words, for the particular case of linear
quadratic optimal control, OLOFC is equivalent to the optimal feedback control.
This is an exception rather than a rule, however.
The dynamic programming solution can be easily extended to the infinite
horizon problem (where p = ). Using the limit argument, it can be shown
that the optimal feedback law for the infinite horizon stochastic problem is the
-horizon LQR discussed for the deterministic case (6.31)(6.32).
16
6.4.2
This time we consider the case where we have only partial, noise-corrupt measurement of the state. Hence, rather than having the measurement of the full
state x, we have measurement of output y, which contains partial information
about x:
y(k) = Cx(k) + 2 (k)
(6.79)
It is assumed that:
1. x(0) is a Gaussian variable of mean x
0 and covariance R0 .
2. 1 (k) and 2 (k) are zero-mean white Guassian sequences of covariances
R1 and R2 respectively.
Optimal Output Feedback Controller
Let us look at the problem of deriving an optimal output feedback policy based
on the quadratic index
)
( p1
X
T
T
T
x (i)Qx(i) + u (i)Ru(i) + x (p)Qt x(p)
(6.80)
Vp = E
i=0
The solution turns out to be a combination of the optimal regulator for the
deterministic problem (LQR) and the optimal state estimator (Kalman filter):
x
(i + 1|i + 1) = A
x(i|i) + Bu(i) + K(i + 1) [y(i + 1) C(A
x(i|i) + Bu(i))]
u(i) = L(i)
x(i|i),
i = p 1, , 0,
(6.81)
where
L(i) = (B T S(i + 1)B + R)1 B T S(i + 1)A
S(i) = AT S(i + 1)A + Q AT S(i + 1)B(B T S(i + 1)B + R)1 B T S(i + 1)A
with the initialization of S(p) = Qt
(6.82)
and
K(i + 1) = P (i + 1|i)C T (CP (i + 1|i)C T + R2 )1
P (i + 1|i) = AP (i|i 1)AT + R1 AP (i|i 1)C T (CP (i|i 1)C T + R2 )1 CP (i|i 1)AT
with the initialization of P (1|0) = AR0 AT + R1
(6.83)
The controller state is to be initialized with x
(0|0) = x
0 .
Derivation of the Optimal Controller Via Dynamic Programming
The derivation is based on stochastic dynamic programming and somewhat involved. Readers not interested in the detailed derivation may skip this section
without loss of continuity.
17
April 4, 2002
(6.84)
In the above, Ip1 represents all the information available for feedback control calculation at t = p 1, including the prior statistics and the collected
measurements (e.g., x
0 , R0 , y(1), , y(p 1)). Since the statistics of x(k 1)
summarizes all the relevant information about the past measurements, we can
choose
Ip1 = {
x(p 1|p 1), P (p 1|p 1)}
(6.85)
where x
(p1|p1) and P (p1|p1) are the conditional mean and covariance
of x(p 1). These quantities can be computed using a Kalman filter that is
started with the initial condition of x(0|0) = x
0 and P0|0 = R0 . Note that
x
(p|p 1) = A
x(p 1|p 1) + Bu(p 1)
(6.86)
min (A
x(p 1|p 1) + Bu(p 1))T S(p)(A
x(p 1|p 1) + Bu(p 1))
u(p1)
+
xT (p 1|p 1)Q
x(p 1|p 1) + uT (p 1)Ru(p 1)
T
+E x
e (p)S(p)
xe (p) + xTe (p 1)Qxe (p 1)} | Ip1
(6.87)
where x
e (p) = x(p) x
(p|p 1) and xe (p 1) = x(p 1) x
(p 1|p 1), which
are zero-mean and uncorrelated with x
(p|p 1) and x
(p 1|p 1) respectively.
The first three terms are the same as those that appeared for the deterministic
system, but with x(p 1) replaced by the mean value x
(p 1|p 1). The last
two terms, trace{S(p)P (p|p 1)} and trace{QP (p 1|p 1)} respectively, are
merely constants and do not affect the solution. Hence, the optimal solution
is in the same form as the deterministic case:
u(p 1) = (B T S(p)B + R)1 B T S(p)A x
(p 1|p 1)
|
{z
}
(6.88)
L(p1)
J1 (Ip1 ) = x
T (p1|p1)S(p1)
x(p1|p1)+trace{QP (p1|p1)}+trace{S(p)P (p|p1)},
(6.89)
where
S(p 1) = AT S(p)A + Q AT S(p)B(B T S(p)B + R)1 B T S(p)A
(6.90)
18
At the next stage, consider solving the two-step-ahead problem posed at time
p 2:
J2 (Ip2 ) =
u(p2)
(6.91)
In writing the last term, we used the fact that P (p|p 1) = AP (p 1|p
1)AT + R1 . Now, note that
x
(p1|p1) = A
x(p2|p2)+Bu(p2)+K(p1) (y(p 1) Cx(p 1|p 2))
{z
}
|
e(p1)
(6.92)
Note that the innovation term e(p1) has the property that E{e(p1) | I p2 } =
0 and it is uncorrelated with x
(p 2|p 2) or u(p 2). Substituting the above
and evaluating the expectation yields
J2 (Ip2 )
T
T
T
+
xT (p 2|p 2)Q
x(p 2|p 2) + uT (p 2)Ru(p 2)
min (A
xT (p 2|p 2) + Bu(p 1))T S(p 1)(A
xT (p 2|p 2) + Bu(p 1))
u(p2)
+
xT (p 2|p 2)Q
x(p 2|p 2) + uT (p 2)Ru(p 2)
+trace{S(p 1)(AP (p 2|p 2)AT + R1 )} + trace{QP (p 2|p 2)}
trace{S(p 1)P (p 1|p 1)}
+trace{S(p)(AP (p 1|p 1)AT + R1 )} + trace{QP (p 1|p 1)}
19
April 4, 2002
The above is in the same form as J1 (Ip1 ) except for the constant terms, which
do not affect the solution. Therefore,
(p 2|p 2)
u(p 2) = (B T S(p 1)B + R)1 B T S(p 1)A x
{z
}
|
(6.94)
L(p2)
In addition,
J2 (Ip2 ) = x
T (p 2|p 2)S(p 2)
x(p 2|p 2)
p
X
trace{S(j)(AP (j 1|j 1)AT + R1 )} + trace{QP (j 1|j 1)}
+
j=p1
(6.95)
where
S(p 2) = AT S(p 1)A + Q AT S(p 1)B(B T S(p 1)B + R)1 B T S(p 1)A
(6.96)
and
P (p 1|p 1) = P (p 1|p 2)
P (p 1|p 2)C T (CP (p 1|p 2)C T + R2 )1 CP (p 1|p 2)
P (p 1|p 2) = AP (p 2|p 2)AT + R1
(6.97)
By induction, the previously shown optimal feedback policy is easily derived.
The optimal cost (i.e., the lowest achievable cost through output feedback
control) is
Jp (x0 ) = x
TP
(0|0)S(0)
x(0|0)
1 X T
T
V = lim E
x (k)Qx(k) + u (k)Ru(k)
(6.99)
p
p
k=0
20
Taking the limit of the solution to the finite horizon problem readily yields
the following output feedback law made up of the Kalman filter (the optimal
state estimator) and the -horizon LQR (the optimal state-feedback regulator
for the deterministic problem):
x
(k|k) = (A BL )
x(k 1|k 1) + K(k) (y(k) (A BL )
x(k 1|k 1)))
u(k) = L x(k|k)
(6.100)
Since K(k) often converges quickly to a steady-state solution K , the above
can be implemented as the linear time-invariant controller,
x
(k|k) = (A BL )
x(k 1|k 1) + K {y(k) (A BL )
x(k 1|k 1)}
u(k) = L x(k|k),
(6.101)
where
1
(6.102)
K = P C T CP C T + R2
In summary, for the linear quadratic Gaussian control problem, the optimal output feedback controller is simply a composition of the optimal state
estimator and the optimal state feedback regulator for the deterministic case.
This result is referred to as the separation principle.
6.4.3
Analysis
The system
x(k + 1) = Ax(k) + Bu(k) + 1 (k)
y(k) = Cx(k) + 2 (k)
(6.103)
with controller
x(k + 1|k + 1) = Ax(k|k) + Bu(k) + K (y(k + 1) C(Ax(k|k) + Bu(k)))
u(k) = Lx(k|k)
(6.104)
gives
x(k + 1)
A BL
BL
x(k)
I
0
=
+
1 (k)+
2 (k)
xe (k + 1)
0
A AKC
xe (k)
I KC
K
|
{z
}
{z
}
|
| {z }
(6.105)
The above equation is only one way coupled and the eigenvalues of the overall
closed-loop systems transition matrix are the eigenvalues of (A BL) and
the eigenvalues of (A AKC). Hence, if these eigenvalues are all stable (i.e.,
inside the unit disk), x and xe approach zero-mean stationary sequences (with
finite variances). In fact,
(
T )
x(k)
x(k)
E
=
(6.106)
xe (k)
xe (k)
21
April 4, 2002
where is the positive (semi)-definite solution to the Lyapunov equation
T + 1 R1 T1 + 2 R2 T2 =
(6.107)
Since the optimal feedback controller of the LQG problem consists of the
Kalman filter and the LQR, each of which is stable under the mild assumptions
stated earlier, the LQG controller is closed-loop stable under the same mild
assumptions.
Bibliography
1. Work on LQR. Kalmans original paper.
2. Work on LQG, separation principle. Cite the main paper. A good
treatment of both topics can be found in Kwakernaak and Sivans.......
3. A good discussion on various stochastic optimal control strategies and
stochastic dynamic programming can be found in ??? by Bertsekas .....
Examples to Include
1. NIKET: Simple state regulation example - stable system, unstable system. Use 2nd order plus deadtime SISO system and double integrator.
2. NIKET: Output regulation in the presence of constant disturbances in
the output and setpoint changes. Use the same 2nd order plus deadtime SISO system but with constant disturbance and setpoint changes.
Output regulation instead of the state regulation. Use the idea below.
Suppose y(k) = Cx(k) + d where d is some constant unknown disturbance. Let us pose the problem of regulating the output at the setpoint
r as the minimization of
p1
X
i=0
(6.108)
x(k + 1)
A 0
x(k)
B
=
+
u(k)
(6.109)
e(k + 1)
CA I
e(k)
CB
where e(k) = y(k) r. Denoting the above system as
z(i + 1) = z(i) + u(i)
(6.110)
22
i=0
where
Q=
0 0
0 Qy
Qt =
0 0
0 Qyt
(6.111)
(6.112)
3. NIKET: Simple stochastic output feedback problem. Use the 2ndorder-plus-deadtime system as before. Compare the -horizon cost of
open-loop optimal control and closed-loop optimal control. Compare
those values obtained from the formula we derived with the cost computed from the solution of the Lyapunov equation of (eq:varlyapunov).
Finally, do 100 Monte-Carlo simulations, compute the average cost, and
compare.
4. NIKET: LQG design for the 4-block output feedback problem. Distillation column, temperature measured, compositions controlled, feed
flowrate and composition disturbances (integrated white noise), L and
V manipulated inputs.
Exercises to Give
1. For the linear time-invariant deterministic system we studied, derive the
quadratic optimal feedback controller for a given time-varying reference
trajectory {r(k), k = 1, , p} within a finite horizon. Use the dynamic
programming approach.
2. Prove that the solution to equation (6.41) indeed provides the infinite
horizon cost as stated.
Solution:
=
=
=
=
=
3. NIKET: (Give a stochastic state-space system.) Compute the openloop optimal cost and feedback optimal cost using the formulae given in
the book. Compare.
23
April 4, 2002
4. Derive an expression for the optimal cost for the infinite horizon LQG
problem.
5. NIKET: (Give an infinite horizon output feedback problem for a stochastic system.) Derive the optimal feedback (LQG) controller. Compute
the optimal cost using the formula you obtained above. Compute the
same cost by writing the closed-loop equation for the obtained controller
and solving the Lyapunov equation (6.107). Finally, perform the MonteCarlo simulation with the controller and compute the average cost.
6. Show that from solving (6.107)
T + 1 R1 T1 + 2 R2 T2 =
indeed represents the closed-loop covariances.
(6.114)
24
Chapter 7
Constrained Quadratic
Optimal Control
In this chapter, we study the linear quadratic optimal control of constrained
systems. We consider the same problems as those studied in the previous chapter, but with constraints on the inputs and the state variables. Both control
computation and closed-loop analysis become greatly complicated when constraints are added. This is because both the system and the resulting control
law are nonlinear and the closed-loop system no longer renders itself to simple
linear analysis methods.
7.1
Let us consider the same finite horizon optimal control problem as the one in
Section 6.1,
)
(
p1
X
T
k=0
k = 0, , p 1
(7.2)
(7.3)
except that now we constrain the state and inputs to lie within some feasible
sets, Xf sb and Uf sb :
x(k) Xf sb ,
k = 1, , p
(7.4)
u(k) Uf sb ,
k = 0, , p 1
(7.5)
25
26
In the simplest case upper and lower bounds would be imposed on x and u.
In general, we will assume that the sets are convex, compact and include the
origin as an interior point so that the origin is a feasible stationary point for
the system. We also assume that the feasible sets are time invariant. The
resulting problem is a constrained least squares problem and no longer yields
a simple analytical solution. A numerical technique must be used. For the
convenience of presentation, we will refer to the above finite horizon optimal
control problem as Jp (x0 ); the same notation will also be used to denote the
optimal cost.
Dynamic programming is not a practically feasible solution in this case.
Numerical solution through dynamic programming requires discretization of
the state space and computation / storage of the optimal cost in the discretized
state space at each stage, as one marches backward from the last stage to the
first stage. Obviously the number of discrete points increases exponentially fast
with state dimension and the computational load for this approach becomes
quickly intractable. This is refer to the curse of dimensionality.
A more practical alternative is to use the explicit multi-step-predictionbased least squares formulation introduced in section 6.1. Substituting the
expression for the prediction (6.7) into the objective function in (6.8) and the
constraint in (7.4) yields the following convex program which can be solved
numerically for a particular x0 :
u + R)U
+ 2U T S uT QS
x x0
min U T (S uT QS
U
with
(7.6)
S x x0 + S u U
Xf sb Xf sb
(7.7)
Uf sb Uf sb
(7.8)
27
April 4, 2002
7.2
xT (k)Qx(k) + uT (k)Ru(k)
min
u(0),,u()
(7.9)
i=0
for the same deterministic system with initial condition x0 and constraints
x(k) Xf sb , k = 1, ,
(7.10)
u(k) Uf sb , k = 0, ,
(7.11)
u(k),,u(k+p1)
( p1
X
i=0
(7.12)
(7.13)
u(k + i) Uf sb , i = 0, , p 1
(7.14)
28
Option 1 A Because the origin was assumed to be in the interior of the feasible state and input set, the optimal control will become unconstrained
as the system approaches the origin. Therefore if we choose p large
enough we can solve the constrained infinite horizon problem exactly in
the following manner.
Choose the terminal weight Qt as the solution of the Riccati equation
AT Qt A + Q AT Qt B(B T Qt B + R)1 B T Qt A = Qt
This way, the terminal cost reflects the unconstrained infinite horizon cost.
Choose p sufficiently large so that the solution yields a terminal
state x(k + p) that satisfies
u(k + i) = L x(k + i) Uf sb
i p
(7.15)
and
x(k + i) Xf sb
i p
(7.16)
(7.17)
(7.18)
29
April 4, 2002
Now p has to be chosen only so that the terminal constraint (7.18) is feasible. Forcing such an artificial constraint in the optimization, however,
can make the solution different from the solution to the original infinite
horizon problem. In other words, the equivalence holds only when the
terminal constraint is inactive.
Option 2 A Rather than assuming the unconstrained control to be optimal
after p, we can require for open-loop stable systems that p be chosen long
enough so that the system state remains in the feasible region without
control.
Following the discussion given in section 6.2.2, we choose the terminal weight Qt as the solution of the Lyapunov equation
A T Qt A + Q = Q t
(7.19)
(7.20)
(7.21)
(7.22)
30
Option 3 Increasing the terminal weight for the finite horizon problem Qt
I has the same effect as imposing an extra end-point or terminal equality constraint x(k + p) = 0. If this terminal constraint is used in the
infinite horizon formulation, all the terms for times k + p, k + p + 1, etc.
in the objective function will be zero and the infinite horizon problem
and the finite horizon problem will become equivalent. The system must
be controllable for this constraint to be feasible, however.
It is worthwhile to note that the options are listed in the order of increasing
restrictiveness and lead to increasing deviations from the optimal solution
of the constrained infinite horizon problem. The difficulty inherent in Options
1 A and 2 A is that one needs to compute p and that p can be very large
leading to computational problems. This problem is alleviated somewhat in
Options 1 B and 2 B but p must still be chosen sufficiently large to ensure
the feasibility of the terminal constraint. In Options 2 C and 3, some or all
the state variables need to be zeroed at the end of the prediction horizon.
Equality constraints are more difficult to handle from a numerical standpoint,
and infeasibility can arise if the control horizon is not sufficiently large.
7.3
Constraint Softening
31
April 4, 2002
For example, state constraints of the form
F x(k) g
(7.23)
F x(k) g + (k),
(7.24)
would be relaxed to
where the vector (k) denotes the positive slack variables expressing the constraint relaxation or violation. In order to keep the violations small, the objective function is augmented by a term penalizing the violations:
(
p1
X
T
Vp =
x (i)Qx(i) + uT (i)Ru(i) + T (i)Q (i)(i)
min
u(0),,u(p1),(0),,(p1)
i=0
The positive definite weighting matrix Q (i) determines how much the various
violations are penalized. The suggested form allows the designer to penalize
the square of the violations in each constraint at each time differently and
thus offers maximum freedom. This, however, comes at the price of introducing many additional tuning variables into the optimization. If one chooses
the vector (k) to be constant rather than varying with time, for each constraint only the worst violation over the horizon is penalized. If the same
scalar (k) is used for each constraint, then at each time step only the worst
one of the constraint violations is penalized. By choosing the form of and
the magnitude of the elements of Q appropriately, a proper trade-off among
computational complexity, magnitude of constraint violation, and its duration
can be achieved.
7.4
7.5
Analysis
In this section, we examine the stability of the finite receding horizon control
law. We examine both the state feedback case and the output feedback case.
32
For our analysis to be meaningful, we must assume that the underlying system
is stabilizable. For a constrained system, the stabilizability can depend on both
the system model and the initial condition. If the system contains unstable
modes, the system may not be globally stabilizable. In this case, we must
assume that the initial condition lies inside the domain of attraction.
7.5.1
For linear systems, a single definition of stability sufficed. For nonlinear systems, different definitions exist. We consider here the definitions of Lyapunov
stability (or stability in the sense of Lyapunov), asymptotic stability, and exponential stability. WE also review the Lyapunovs direct method, which is
useful for analyzing stability of nonlinear systems.
Stability Concepts
Definition 2 The equilibrium x = 0 is said to be stable (in the sense of
Lyapunov) if, for any > 0, there exists a corresponding > 0 such that, if
kx(0)k < , then kx(t)k < for all t 0. Otherwise the equilibrium is said to
be unstable.
Definition 3 The equilibrium x = 0 is said to be asymptotically stable if it is
stable (in the sense of Lyapunov) and there exists some such that kx(0)k <
implies x(t) 0 as t .
The second condition for asymptotic stability is called attractivity and the
set of all points such that trajectories initiated from these points converge to
the origin is called domain of attraction. Note that, while attractivity and
asymptotic stability are equivalent in linear systems, the former is necessary
but not sufficient for the latter in nonlinear systems.
The stability concepts are pictorially illustrated in Figure 7.1. Note that,
in the context of a linear system, a marginally stable system, for which some
of the eigenvalues lie on the unit circle, would be considered stable according
to the above definition of stability. However, it wouldnt be asymptotically
stable.
We also introduce the notion of exponential stability.
Definition 4 The equilibrium x = 0 is said to be exponentially stable if it
is stable (in the sense of Lyapunov) and there exists some such that, if
kx(0)k < , then
kx(t)k kx(0)ket , t > 0
for some > 0 and > 0
Exponential stability is a stronger property than asymptotic stability. For
linear systems, the two stability concepts are equivalent.
33
April 4, 2002
Figure 7.1: Concepts of stability for nonlinear systems.
Finally, the above definitions for asymptotic stability and exponential stability are for local stability. For global stability, the conditions need to hold for
any starting state, not just a starting state within some ball.
Definition 5 The equilibrium x = 0 is said to be globally asymptotically
(or exponentially) stable if asymptotic (or exponential) stability holds for any
starting state.
Lyapunovs Direct Method
Lyapunovs direct method is based on the idea that, if an energy or energy-like
function of the system is continuously dissipated, the system must settle down
to an equilibrium. It is one of the most popular and useful stability analysis
techniques available for general nonlinear systems. We present this method in
the context of a discrete-time system.
Let x be the state of some autonomous nonlinear dynamic system x(k +
1) = f (x(k)). Consider a scalar function V (x) such that, within some ball
B = {x : kxk < },
V (0) = 0 and V (x) > 0 for x 6= 0. Such a function is called a locally
positive-definite function.
V (x) < for kxk < .
For any trajectory of x generated from the system, V (x(k+1))V (x(k))
0 for all k 0.
Such V (x) is called a Lyapunov function for the system x(k + 1) = f (x(k)).
The existence of a Lyapunov function implies Lyapunov stability. In addition, if the second condition is modified to V (x(k + 1)) V (x(k)) < 0, the
existence of such a V (x) implies asymptotic stability.
For global asymptotic stability, the above conditions need to be satisfied
for any x, not just within some ball. Additionally, V (x) has to satisfy the
following condition called radial unboundedness:
V (x) as kxk
The main difficulty with the Lyapunovs direct method is that it not always
clear how to choose a Lyapunov function. Later, we will use this method to
prove the asymptotic stability of some of the suboptimal receding horizon
control solutions we introduced earlier.
34
7.5.2
(7.26)
(7.27)
min
u(k),,u()
xT (k + i)Qx(k + i) + uT (k + i)Ru(k + i)
i=0
(7.28)
i = 1, ,
(7.29)
u(k + i) Uf sb ,
i = 0, , p 1
(7.30)
u(k + i) = 0,
ip
(7.31)
Then,
J,p (x(k)) J,p (x(k + 1)) + xT (k)Qx(k) + uT (k)Ru(k)
(7.32)
To see how this inequality arises, let us define the restriction of input sequence
35
April 4, 2002
U as
0 I 0
0 0 I
..
R(U) =
.
0 0
0 0
0
0
I
0
(7.33)
(7.32) is true because R(U (k)), the restriction of the optimal solution for
J,p (x(k)), represents a feasible but possibly suboptimal solution for J,p (x(k+
1)). Hence, Denoting the corresponding suboptimal cost as J,p (x(k + 1)), it
is clear that
J,p (x(k)) = J,p (x(k + 1)) + xT (k)Qx(k) + uT (k)Ru(k)
(7.34)
(7.35)
k1
X
(7.36)
i=0
Pk1
Since i=0 [xT (i)Qx(i) + uT (i)Ru(i)] 0, the left-hand side is bounded below
by zero. With J,p (x(0)) < , J,p (x(k)) 0 , the left hand-side is finite for
all k 0. So the right-hand side must also be finite for all k 0. This means,
with k = ,
X
[xT (k)Qx(k) + uT (k)Ru(k)] <
(7.37)
k=0
(7.38)
Such a ball always exists because, when the state is sufficiently close to the
origin, the optimal control in linear control and the optimal cost is a quadratic
function of x - as was proven in the discussion of multi-parametric programming.
From the preceding argument,
J,p (x(k)) J,p (x(0)) k > 0
(7.39)
36
Also, given the form of the objective function (quadratic weighting of the state
vector with a positive-definite weighting matrix),
a > 0 such that akx(k)k2 J,p (x(k))
(7.40)
(7.41)
1/2
b
kx(k)k
kx(0)k
a
(7.42)
which implies
Hence,
)
(
b 1/2
, c
kx(0)k min
a
kx(k)k
(7.43)
Comments
The assumption of Q > 0 can be relaxed to Q 0 and the detectability of (Q1/2 , A). The detectability means that the modes that are not
weighed in the objective function and therefore evolve according to their
autonomous dynamics are asymptotically stable. Hence the preceding
argument applied to those modes that are weighed in the objective function is sufficient for proving the asymptotic stability.
Extending the same proof to the case where the state constraints are
softened through slack variables, as in (7.25) with p = , is straightforward, assuming a same weighting matrix Q is used throughout the
horizon. The proof is left as an exercise. One potential complication in
ma in the terminal constraint is
the implementation, however, is that X
37
April 4, 2002
(7.44)
The inequality follows from the fact that the restriction of the solution
for Jp (x(k)) is a feasible solution for Jp (x(k + 1)) because x(k + p) = 0
implies x(k + p + 1) = 0. However, the assumption of Jp (x(0)) < is
more restrictive in this case.
Alternatively, we can consider applying the Lyapunovs direct method we
discussed earlier. Here, we may consider the optimal cost function Jp (x),
which represents Jp (x0 ) with x0 = x, as a Lyapunov function candidate for
the closed-loop system x(k + 1) = Ax(k) + Bu (x(k)). The fact that Jp (x(k))
is a positive-definite function follows immediately from the fact that x = 0
is an equilibrium point (and therefore Jp (0) = 0) and Q > 0, R > 0 (and
therefore Jp (x) > 0 for x 6= 0). Also, V (x(0)) < by assumption. Hence, the
key condition to prove for asymptotic stability is the negative definiteness of
Jp (x(k + 1)) Jp (x(k)). Note that
(7.46)
since the optimal solution to Jp1 (x(k +1)) is provided by the restriction
of the optimal solution to Jp (x(k)), which also is a feasible solution to
Jp (x(k + 1)). Since this particular feasible solution makes x(k + p) = 0
and hence x(k + p + 1) = 0, Jp1 (x(k + 1)) is also the cost for the p-step
problem under the same (suboptimal) solution.
Since (xT (k)Qx(k) + uT (k)Ru(k)) is negative-definite, negative semidefiniteness of Jp (x(k + 1)) Jp1 (x(k + 1)) implies Jp (x(k + 1))
Jp (x(k)) < 0.
38
(7.47)
7.5.3
Here we will show that, if the state feedback law u(k) = u (x(k)) yielding
an asymptotically stable closed-loop system (where u () represents a solution
operator to a constrained quadratic problem) is coupled with a stable linear
observer, then the resulting output feedback law u(k) = u (
x(k|k)) also yields
an asymptotically stable closed-loop system. In other words, the stability of
the state feedback controller and that of the observer can be checked separately
for the stability of the combined system.
The key point in proving the above is that the stable MPC feedback laws we
described earlier are nonlinear but the nonlinearity is relatively well-behaved.
Specifically, they are Lipschitz continuous, which means that
there exists a fixed constant K such that ku (x+)u (x)k Kkk
for all > 0 and for all x.
For example, for the case that the constraints are linear inequalities yielding
a QP for the control calculation, we have already proven that the resulting
feedback law u (x) is a piecewise affine function of x, which is clearly Lipschitz
continuous.
The closed-loop system under the output feedback controller can be written
as follows:
x(k + 1) = Ax(k) + Bu (
x(k|k))
= Axk + Bu (x(k)) + B[u (
x(k|k)) u (x(k))]
(7.48)
39
April 4, 2002
the origin remains an asymptotically stable fixed point for the perturbed system x(k + 1) = g(x(k)) + e(k) where e(k) is an exponentially converging sequence.
The proof of the above is somewhat involved and is skipped here, but the
interested readers can find it in the references provided at the end of this
chapter.
What remains to show for us to prove the asymptotic stability of the output
feedback controller is that B[u (
x(k|k)) u (x(k))] is indeed an exponentially
converging sequence. Since u (x) is Lipschitz continuous,
B[u (
x(k|k)) u (x(k))] Kkx(k) x
(k|k)k for some K > 0.
(7.49)
Since (x(k) x(k|k)) is an exponentially converging sequence due to the assumption of a stable linear observer,
kx(k) x
(k|k)k elambdak for some > 0 and > 0,
(7.50)
and therefore
ku (x(k)) u (
x(k|k))k Kelambdak
7.6
(7.51)
i = 1, , p
(7.52)
However, the above does not guarantee the satisfaction of the constraint with
much confidence. A better alternative is based on the chance constraint
Pr{x(k + i) Xf sb ,
i = 1, , p}
(7.53)
40
p1 T
T
Jp (x(k)) = minu(k),,u(k+p1) E
i=0 x (k + i)Qx(k + i) + u (k + i)Ru(k + i)
i = 0, , p 1 (7.55)
i = 1, , p
i = 0, , p 1
(7.56)
(7.57)
Besides the computational difficult involved in handling the chance constraint, u(k) = u (x(k)) (or u(k) = u (
x(k|k)) resulting from solving the above
program at each time step does not represent the optimal feedback law, even
with p = . This is different from the unconstrained case in which, due to
the separation principle, the optimal feedback solution reduces to that of the
deterministic case coupled with the optimal estimator.
If we replace the chance constraint with the less desirable alternative of
E{x(k + i)} Xf sb , the solution to the above open-loop optimal control
problem turns out to be identical to the deterministic case. Hence, considering
the stochastic nature of the system in the control calculation does not provide
anything extra.
Despite these theoretical difficulties, the deterministic formulation has been
applied successfully in many stochastic problems. It has been observed empirically that, since constrained linear systems represent only a minor departure
from linear systems, especially when there are no state constraints, the combination of the optimal deterministic regulator (constrained LQR) with the
optimal state estimator yields a satisfactory suboptimal solution in almost all
cases.
Examples to Include
1. Performance of constrained vs. unconstrained on a simple problem.
(double integrators?)
2. Comparison of performances among different approximations. Use a
simple 1st order or 2nd order system. Use a short horizon to make the
comparison more meaningful.
April 4, 2002
41
Bibliography
Different choice of terminal penalty to approximate the infinite horizon
control. Kwon and Pearson, Book by Bitmead, Keerthi and Gilbert,
Mayne and Michalska, Rawlings and Muske, etc.
Maximal Output Admissible Set theory by Gilbert and Tan.
Parameteric Quadratic Programming solution by Bemporad et al.
Stability of constrained LQR. Keerthi and Gilbert, Mayne and Michalska, Rawlings and Muske, etc.
Constraint relaxing schemes. Various formulations and trade-offs (Rawlings). Exact softening.
Stability for LQR with soft constraints. Unstable systems with constrained inputs. Zheng and Morari.
Stability proof for output feedback problem. Rawlings, etc. Halanay.
MODEL PREDICTIVE
CONTROL
Manfred Morari
Jay H. Lee
Carlos E. Garca
Chapter 12
Identification
Identification of process dynamics is perhaps the most time consuming step
in implementing a model predictive controller and one that requires relatively
high expertise from the user. In this section, we give an introduction to various
identification methods and touch upon some key issues. Since system identification is a very broad subject that can easily take up an entire book, we will
limit our objective to giving just an overview and providing a starting point
for further exploration of the field. Hence, our treatment of various methods
and issues will be somewhat brief and informal. References will be given at
the end for more complete, detailed treatments of the various topics presented
in this chapter.
12.1
Problem Overview
The goal of system identification is to build a mathematical relation for predicting the system behavior by using input output data gathered from the
process. For convenience, the mathematical relation searched for is often limited to linear ones. As we discussed in Chapter ?? (Linear Time-Invariant
System Models), both known and unknown inputs affect the outputs. Since
not all inputs change in a deterministic manner, it is often desirable to identify
a model that has both deterministic and stochastic components.
In terms of how input output data are translated into a mathematical relation, the field of identification can be divided broadly into two branches:
parametric identification and nonparametric identification. In parameteric
identification, the structure of the mathematical relation is parameterized
(compactly) a priori and the parameters of the structure are fitted to the
data. In nonparametric identification, no (or very little) assumption is made
with respect to the model structure. Frequency response identification is nonparametric. Impulse response identification is also nonparametric, but it can
also be viewed as parametric identification since an impulse response of a finite
1
Identification-DRAFT
12.2
12.2.1
Model Structures
(12.1)
where y is the output and u is the input. (Most of times, u will be a manipulated input, but it can also be a measured disturbance variable). G(q, )
referred to as the process model represents the causal relationship between
the deterministic input u and the output y. (k) is a white noise sequence,
which by itself does not represent any physical variable. Together with the
noise model H(q, ), they define the auto- and cross-correlation functions of
the residual sequence (y(k) G(q, )u(k)). For a stationary process, without
loss of generality, H(q, ) is assumed to be a stable, stably invertible, and normalized (i.e., H(, ) = 1) transfer function. This is in view of the spectral
factorization theorem that states any spectrum can be factorized in terms of
a stable and stably invertible factor. (see Appendix...). For processes exhibiting random mean shifts, it is necessary that the noise model include an
integrator. Equivalently, we can replace y(k) and u(k) with y(k) and u(k)
in the above.
Within the general structure, different parametrizations exist. Let us discuss some popular ones, first in the single input, single output context.
ARX Model
If we represent G as a rational function and express it
as a linear equation with an additive error term, we obtain
y(k) + a1 y(k 1) + + an y(k n) = b1 u(k 1) + + bm u(k m) + (k)
(12.2)
When the equation error (k) is taken as a white noise sequence, the
resulting model is called an ARX model (AR for Auto-Regressive and X
for eXtra input u). Hence, the ARX model corresponds to the following
parametrization of the transfer functions:
G(q, ) =
H(q, ) =
b1 q 1 + + bm q m
B(q)
=
A(q)
1 + a1 q 1 + + an q n
1
1
=
1
A(q)
1 + a1 q + + an q n
(12.3)
(12.4)
H 1 (q, )G(q, )
(12.5)
C(q)
A(q)
1 + c1 q 1 + + c` q `
1 + a1 q 1 + + an q n
(12.7)
Identification-DRAFT
In the above y(k) represents the noise-free output. Customarily, (k) is
assumed to be a white noise. This means the OE structure is equivalent
to
A(q)
and H(q) = 1
(12.9)
G(q, ) =
B(q)
From this, it may seem that the structure is not that useful (since
disturbance / noise effects in most cases are auto-correlated and therefore
not adequately represented by a white noise). However, the use of the
OE structure can be more general. For instance, the OE model can be
used when H(q) is not 1, but a priori known (i.e., the residual sequence
is a colored noise of a known spectrum). In this case, we can write the
model as
H 1 (q)y(k) = G(q, ) H 1 u(k) +(k)
(12.10)
|
{z
}
| {z }
yf (k)
uf (k)
Note that the above is in the form of (12.8). Simple filtering of input
and output de-correlates the noise and gives the standard OE structure.
Parameter estimation is complicated by the fact that ys are not given
and depend on the choice of parameters.
X
G(q) =
bi q i
(12.11)
i=1
Truncating the power series after n terms, one obtains the model
bi Bi (q)
(12.13)
i=1
One popular choice for {Bi (q)} is the so called Laguere functions defined
as
1 2 1 q i1
Bi (q) =
(12.14)
q
q
Box-Jenkins Model
A natural generalization of the output error
model is to let the disturbance transfer function be a rational function
of unknown parameters. This leads to the Box-Jenkins model, which has
the structure of
B(q)
C(q)
u(k) +
(k)
(12.15)
y(k) =
A(q)
D(q)
This model structure is quite general, but the parameter estimation is
nonlinear and loss of identifiability can occur.
General Model
C(q)
B(q)
u(k) +
(k)
F (q)
D(q)
(12.16)
This way some but not all poles can be shared between the process model
and the noise model. Note that this model includes all the other models
as subsets.
All of the above models can be generalized to include an integrator in
H(q, ). For instance, we can extend the ARMAX model to
y(k) =
1
C(q)
B(q)
u(k) +
(k),
1
A(q)
1 q A(q)
(12.17)
(12.19)
Identification-DRAFT
all the matrix entries assumed unknown, for instance, one can end up with
an over-parameterized structure leading to loss of identifiability. In general,
significant prior knowledge (e.g., observability indices for outputs) is needed to
obtain a parsimonious parameterization. Starting with an identifiable structure is important in view of the fact that many of the model structures lead
to a nonlinear parameter estimation problem.
Note on Loss of Identifiability: For certain model structures, overparameterization can result in parameter values becoming nonunique. This is
referred to as loss of identifiability. For example, assume that the underlying
system is the ARMAX system,
y(k) =
1 q 1
1 + 1 q 1
u(k)
+
(k).
1 + 1 q 1
1 + 1 q 1
(12.20)
1 + c1 q 1 + c2 q 2
b1 q 1 + b2 q 2
u(k)
+
(k)
1 + a1 q 1 + a2 q 2
1 + a1 q 1 + a2 q 2
(12.21)
(1 + 1 q 1 )(1 + 2 q 1 )
1 q 1 (1 + 2 q 1 )
u(k) +
(k), (12.22)
1
2
(1 + 1 q )(1 + 2 q )
(1 + 1 q 1 )(1 + 2 q 2 )
12.2.2
The optimal one-step ahead predictor for system (12.1) can be written as
(12.23)
By comparing (12.1) with (12.23), we see that the prediction error (y(k)
y(k|k 1)) is simply white noise (k), which verifies the assertion
that (12.23)
N
X
k=1
(12.24)
where epred (k, ) = y(k) y(k|k 1) and k k2 denotes the Euclidean norm.
Use of other norms is possible, but the 2-norm is by far the most popular.
Using (12.23), we can write
epred (k, ) = H 1 (q, ) (y(k) G(q, )u(k))
(12.25)
This method of obtaining parameter estimates is referred to as the Prediction Error Method (PEM). The numerical complexity of PEM depends on the
model structure.
For certain model structures, the 2-norm minimization of prediction error
is formulated as a linear least-squares problem. For example, for the
1
ARX structure, G(q, ) = B(q)
A(q) , and H(q, ) = A(q) and
epred (k, ) = A(q)y(k) B(q)u(k)
= y(k) + a1 y(k 1) + + an y(k n)
b1 u(k 1) bm u(k m)
(12.26)
Since epred (k, ) is linear with respect to the unknown parameters, the
P
2
minimization of N
k=1 epred (k, ) is a linear least squares problem.
(12.27)
where yf (k) = H 1 (q)y(k) and uf (k) = H 1 (q)u(k). Again, the expression is linear in the unknowns and the prediction error minimization (PEM) is a linear least squares problem. If the noise model was
1
H(q), then yf (k) and uf (k) can be redefined as H 1 (q)y(k) and
1q 1
H 1 (q)u(k) respectively. The same idea applies to Laguerre or other
orthogonal expansion models.
Identification-DRAFT
PEM for other model structures such as the ARMAX and Box-Jenkins
structures is not a linear least squares problem but pseudo-linear regression can be used for them. For example, the optimal predictor for the
ARMAX model can be written in the following form (see Exercise??):
y(k|k 1) = B(q)u(k) + [1 A(q)]y(k) + [C(q) 1][y(k) y(k|k 1)]
(12.28)
The above equation is called the pseudo-linear regression form of the
ARMAX model. Since the right-hand-side of the equation depends only
on the past values of y(k) and y(k|k 1) (as the leading coefficients for
A(q) and C(q) are both 1), by treating the past predictions as if they were
optimal (which they arent since they are based only on coarse estimates
of system parameters), one can recursively update the parameter values
using the linear least squares method.
12.2.3
We just saw that prediction error minimization for many model structures can
be cast as a linear regression problem. The general linear problem can be
written as
y(k) = T (k) + e(k)
(12.29)
where y is the observed output (or filtered output), is the regressor vector,
is the parameter vector to be identified, and e is the residual error (that
depends on the choice of ). {}(k) denotes the kth sample. In the least
squares identification, is found
such that the
squared sum of the residual is
n
PN 2 o
LS
minimized, i.e., N = arg min k=1 e (k) . The 2-norm minimization of
prediction error for certain model structures can be cast in this form.
For a data set collected over N sample intervals, (12.29) can be written
collectively as the following set of linear equations:
YN = N + EN
(12.30)
where
N
YN
EN
=
=
=
(1) (N )
y(1) y(N )
e(1) e(N )
(12.31)
(12.33)
(12.32)
(12.34)
Convergence
For the discussion of convergence, let us assume that the underlying system
(from which the data are obtained) is represented by the model
y(k) = T (k)o + (k)
(12.35)
o is the true parameter vector in this context and (k) is a term due to
disturbances, noise, etc.
Some insight can be drawn by rewriting the least squares solution in the
following form:
LS
N
i1 P
T
N
1
T (k)
(k)
k=1
k=1 (k) (k)o + (k)
N
i1 P
h P
N
1
T (k)
(k)
= o + N1 N
k=1
k=1 (k)(k)
N
=
1
N
PN
(12.36)
P
T (k)
(k)
represents the error in the parameter estimate. Assume that limN N1 N
k=1
exists (which is true if (k) is quasi-stationary). In order that
lim
"
N
1 X
(k)T (k)
N
k=1
#1
N
1 X
(k)(k) = 0,
N
(12.37)
k=1
N
1 X
(k)(k) = 0
N N
lim
(12.38)
k=1
2.
rank
lim
"
N
1 X
(k)T (k)
N
k=1
#)
= dim{}
(12.39)
10
Identification-DRAFT
The first condition is satisfied if the regressor vector and the residual sequence are uncorrelated. There are two scenarios under which this condition
holds:
(k) is a zero-mean whitePsequence. Since (k) does not contain (k),
E{(k)(k)} = 0 and N1 N
k=1 (k)(k) 0 as N . In the prediction error minimization, if the assumed model structure is unbiased,
(k) is white.
(k) is independent of (k) and the mean of at least one of them is zero.
For instance, in the case of an FIR model (or an orthogonal expansion
model), (k) contains inputs only and is therefore independent of (k)
whether it is white or nonwhite (assuming the data were collected in an
open-loop fashion). This means that the FIR parameters can be made
to converge to the true values even if the disturbance transfer function
H(q) is not known perfectly (resulting in nonwhite prediction errors), as
long as uf (k) is designed to be a zero-mean signal that is independent
of (k). The same is true for an OE structure but not true for an ARX
structure since (k) contains past outputs that would be correlated with
(k) if is a non-white sequence.
h P
i
T (k)
In order for the second condition to be satisfied, limN N1 N
(k)
k=1
must exist
and
should
be
nonsingular.
The
rank
condition
on
the matrix
h P
i
N
1
T
limN N k=1 (k) (k) is called the persistent excitation condition as
it is closely related to the notion of persistency of excitation (of an input
signal), which we shall discuss in Section 12.2.4.
Statistical Properties
Let us again assume that the underlying system is represented by (12.35).
We further assume that (k) is an independent, identically distributed (i.i.d.)
random variable sequence of zero mean and variance r . Then, using (12.36),
we can easily see that
!1
N
N
1 X
X
1
LS
(k)T (k)
(k)(k) = 0 (12.40)
E{N
0 } = E
N
N
k=1
k=1
and
LS )(
LS o )T }
E{(N
o
N
P
1 P
P
1
N
N
N
1
1
1
T (k)
T (k)
T (k)
=
(k)
(k)r
(k)
2
k=1
k=1
k=1
N
N
1 N
P
N
r
1
T
=
k=1 (k) (k)
N
N
= r (TN N )1
(12.41)
11
(12.40) implies that the least squares estimate is unbiased. (12.41) gives the
covariance matrix of the parameter estimate. This information can be used
to compute confidence intervals. For instance, when normal distribution is
assumed, one can compute an ellipsoid corresponding to a specific confidence
level by using a 2 table.
12.2.4
Persistency of Excitation
N
1 X
rank lim
(k)T (k)
N N
k=1
= dim{}
(12.42)
PN
u(k 2)u(k 1) u(k 2)u(k 2)
..
..
k=1
.
.
u(k 1)u(k n)
u(k 2)u(k n)
..
..
.
.
u(k n)u(k n)
(12.44)
The above is equivalent to requiring the power spectrum of u(k) to be nonzero
at n or more distinct frequency points between and .
Now, suppose (k) consists of past inputs and outputs. A necessary and
sufficient condition for (12.42) to hold is that
the input is persistently exciting of order dim{}.
This follows trivially in the case that (k) is made of n past inputs only (as
in FIR models). In this case,
N
1 X
lim
(k)T (k) = Cun
N N
(12.45)
k=1
The condition also holds when (k) contains filtered past inputs uf (k1), , uf (k
n) (where uf (k) = H 1 (q)u(k)). Note that:
uf () =
u ()
|H(ej )|2
(12.46)
12
Identification-DRAFT
12.2.5
where epred (k, ) = H 1 (q, ){y(k) G(q, )u(k)}. Suppose the true system is
13
(12.48)
Then,
epred (k, ) =
Go (q) G(q, )
Ho (q)
u(k) +
(k)
H(q, )
H(q, )
By Parsevals theorem,
R
P
2
limN N1 N
(k,
)
=
e
k=1
pred
e ()d
2 u ()
R
= Go (ej ) G(ej , )
j
|H(e
,)|2
(12.49)
()
d
|Ho (ej )|
|H(ej ,)|
(12.50)
To obtain some insight, let us assume for the moment that the noise model
does not contain any unknown parameter,
!
Z
N
Ho (ej )2
2 u ()
1 X 2
j
j
Go (e ) G(e , )
epred (k, ) =
+
() d
lim
j )|2
j )|2
N N
|H(e
|H(e
k=1
(12.51)
Since the last term of the integrand is unaffected by the choice of in this
case, we may conclude that PEM selects such that the L2 -norm of the error
Go (q)G(q, ) weighted by the filtered input spectrum uf () (where uf (k) =
H 1 (q)u(k)) is minimized. An implication is that, in order to obtain a good
frequency response estimate at a certain frequency region, the filtered input u f
must be designed so that its power is concentrated in that region. If we want
good frequency estimates for the entire frequency range, an input signal with
a flat spectrum (e.g., a sequence of independent, zero mean random variables)
is the best choice.
Frequency domain bias distribution can be made more flexible by adding
the option of prefiltering input-output data before applying the PEM. This
Z
L(ej )2 u () L(ej )2 Ho (ej )2
2
Go (ej ) G(ej , )
+
() d
=
|H(ej )|2
|H(ej )|2
(12.52)
Hence, by pre-filtering the data before the parameter estimation, one can
affect the bias distribution. This is a useful flexibility as it is not always easy
to shape or change the input spectrum. For example, new data may have to
be collected in order to change the input spectrum.
14
Identification-DRAFT
Finally, we have based our argument on the case where the noise model
does not contain any unknown parameter. When the noise model contains
parameters that are shared with the process model (as in ARX or ARMAX
models), the noise spectrum |Ho (ej )|2 does have an effect on the bias distribution. However, the qualitative effect of input spectrum and prefiltering
remain the same.
Example 12.1 Show the effect of prefiltering on the frequency bias
12.2.6
1
( T (k))2
(12.54)
dF (; y(k)) =
exp
2r
2r
In the above, represents a particular realized value for y(k) (see Appendix ?? for notation).
In performing parametric identification with N data points, we can work
with a joint PDF for YN = (y(1), , y(N )). Let us denote the joint PDF as
dF (N ; YN ). Again, N is a variable representing a realization of YN . Suppose the actual observations (the data) are given as YN = (
y (1), , y(N )).
Once we insert these values into the probability density function, dF (YN ; YN )
15
ML
Let us apply the maximum likelihood method to the following linear identification problem:
YN = N + E N
(12.56)
In the above, we assume that EN is a zero-mean Gaussian variable vector of
covariance RE . Then, we have
dF (YN ; YN ) = dF (YN N ; ENn)
o
exp 21 (YN N )T RE1 (YN N )
= N1
(2) det(RE )
(12.57)
N
;
Y
)
N
N
o
= arg max 21 (YN N )T RE (YN N )
o
n
(12.58)
Note that the above is a weighted least squares estimator. We see that, when
the weighting matrix is chosen as the inverse of the covariance matrix for the
output error term EN , the weighted least squares estimation is equivalent to
the maximum likelihood estimation. In addition, the unweighted least squares
estimator is a maximum likelihood estimator for the case when the output
error is an i.i.d. Gaussian sequence, in which case the covariance matrix for
EN is in the form of r IN .
Bayesian Estimation
Bayesian estimation is a philosophically different approach to the parameter
estimation problem. In this approach, parameters themselves are viewed as
random variables with a certain prior probability distribution. If the observations are described in terms of the parameter vector, the probability distribution of the parameter vector changes after the observations. The distribution
16
Identification-DRAFT
YN |) dF (;
)
dF (YN |;
dF (YN ; YN )
(12.59)
M AP
YN |) dF (;
)
N
= arg max dF (YN |;
(12.60)
Note that we end up with a parameter value that maximizes the product of
the likelihood function and the prior density.
Let us again apply this concept to the linear parameter estimation problem
of
YN = N + E N
(12.61)
1
1
T 1
xx
) R (
xx
)
(12.63)
exp (
dFN (
x, x)(x,R) = p
2
(2)N det(R)
The MAP estimate can be obtained by maximizing the logarithm of the posterior PDF:
M AP
N
n
o
T P 1 (0)(
(0)
T R1 (YN N )
1 ( (0))
= arg max 12 (YN N )
E
2
o
n
1
1
T P 1 (0)(
T
(0))
1 T 1
M AP
N RE YN + P 1 (0)(0)
N
= TN RE1 N + P 1 (0)
M AP
17
(12.65)
(12.66)
P (k1)(k)
r +T (k)P (k1)(k)
T (k)P (k1)
P (k 1) P (k1)(k)
r +T (k)P (k1)(k)
(12.67)
n
o
T |Y
(k)
represents kM AP or E {|Yk } and P (k) = E ( (k))(
(k))
k .
Derivation of the above formula is straightforward when you consider the problem as a special case of the Kalman filter
18
Identification-DRAFT
(12.68)
where 1 (k) and 2 (k) are i.i.d. Gaussian sequences. Here the parameter vector
(k) is assumed to be time-varying in a random walk fashion. One may also
model 1 (k) and 2 (k) as auto-correlated signals by further augmenting the
state vector. The recursive Bayesian estimator for the above can be derived
using the Kalman filter technique (See Exercise???)
We will demonstrate an application of the Bayesian approach to the impulse response coefficient identification through the following example.
Example 12.2 In practice, it may be more appropriate to assume (in prior to
the identification) the derivatives of the impulse response as zero-mean random
variables of Gaussian distribution and specify the covariance of the derivative
of the impulse response coefficients. In other words, one may specify
dh
hi hi1
E
E
= 0;
1in
dt t=iTs
Ts
(
(
)
2 )
i
hi hi1 2
dh
= 2
E
E
dt t=iTs
Ts
Ts
(12.69)
(12.70)
In this case, P (0) (the covariance for ) takes the following form:
1
1
0
0
2
0
..
.
1
1
0
1 1
0
.
.
.
..
.
0
. .. ..
.
.. . . . . . .
..
.
.
.
.
.
0 1
n
(12.71)
Note that the above is translated as penalizing the 2-norm of the difference
between two successive impulse response coefficients in the least squares identification method. It is straightforward to use the above concepts to include
prior estimates of time constants, etc.
1
0
1 1
0
.
.
.
.
. .. ..
P (0) =
0
.. . . . . . .
.
.
.
.
0 1
12.2.7
Other Methods
There are other methods for estimating parameters in the literature. A method
that stands out is the instrumental variable (IV) method. The basic idea
0
0
..
.
1
19
behind this method is that, in order for a model to be good, the prediction
error must show little or no correlation with past data. If they show significant
correlation, it implies that there is information left in the past data not utilized
by the predictor.
In the IV method, a set of variables called instruments (denoted by
vector hereafter) must be defined first. contains some transformations
(linear or nonlinear) of past data (y(k 1), , y(0), u(k 1), , u(0)). Then,
is determined from the following relation:
N
1 X
(k)epred (k, ) = 0
N
(12.72)
k=1
12.3
When one has little prior knowledge about the system, nonparametric identification which assumes very little about the underlying system is an alternative. Nonparametric model structures include frequency response models,
impulse response models, etc.. These model structures intrinsically have no
finite-dimensional parameter representations. In reality, however, the dividing line between parametric identification and nonparametric identification is
somewhat blurred: In nonparametric identification, some assumptions are always made about the system structure (e.g., a finite length impulse response,
20
Identification-DRAFT
smoothness of the frequency response) to obtain a well-posed estimation problem. In addition, in parametric identification, a proper choice of model order
is often determined by examining the residuals from fitting models of various
orders.
12.3.1
Dynamics of a general linear system can be represented by the systems frequency response, which is defined through amplitude ratio and phase angle
at each frequency. The frequency response information is conveniently represented as a complex function of of which the modulus and argument define
the amplitude ratio and the phase angle respectively. Such a function can be
easily derived from the systems transfer function G(q) by replacing q with ej .
Hence, the amplitude ratio and phase angle of the system at each frequency
is related to the transfer function parameters through the following relations:
q
A.R.() = |G(ej )| = Re{G(ej )}2 + Im{G(ej )}2
(12.73)
Im{G(ej )}
() = arg G(ej ) = tan1
(12.74)
Re{G(ej )}
Since G(ej ) (0 for a system with sample time of 1) defines system dynamics completely, one approach to system identification is to identify
G(ej ) directly. This belongs to the category of nonparametric identification
as frequency response is not parametrized by a finite-dimensional parameter
vector. (There are infinite number of frequency points in {0 }.)
Frequency Response Computation
The most immediate way to identify the frequency response is through a sinewave testing, where sinusoidal perturbations are made directly to system input
at different frequencies. Though conceptually straightforward, this method is
of limited practical value since (1) sinusoidal perturbations are difficult to
make in practice, and (2) each experiment gives frequency response at only a
single frequency.
A more practical approach is to use the results from the Fourier analysis.
From the z-domain input / output relationship, it is immediate that, for system
y(k) = G(q)u(k),
Y ()
G(ej ) =
(12.75)
U ()
where
Y () =
X
k=1
y(k)ejk
(12.76)
21
U () =
u(k)ejk
(12.77)
k=1
Hence, by dividing the Fourier transform of the output data with that of
the input data, one can compute the systems frequency response. What
complicates the frequency response identification in practice is that one only
has a finite length data record. In addition, output data are corrupted by
noise and disturbances.
Let us assume that the underlying system is represented by
y(k) = G(q)u(k) + e(k)
(12.78)
YN () =
UN () =
EN () =
N
1 X
y(k)ejk
N k=1
(12.79)
N
1 X
u(k)ejk
N k=1
(12.80)
N
1 X
e(k)ejk
N k=1
(12.81)
Then,
RN () EN ()
YN ()
N () =
G
= G(ej ) +
+
UN ()
UN ()
UN ()
where |RN ()| =
c1
N
(12.82)
points is an estimate of the true system frequency response G(ej ) and will be
referred to as the Empirical Transfer Function Estimate (ETFE).
Statistical Properties of the ETFE
Let us take the expectation of (12.82):
RN ()
RN () EN ()
j
+
= G(ej ) +
E{GN ()} = E G(e ) +
UN ()
UN ()
UN ()
(12.83)
N () G(ej )
G
where N
c2
N.
N () G(ej )
G
e + N
|UN ()|2
(12.84)
22
Identification-DRAFT
Since the second term of the RHS of (12.83) decays as
asymptotically unbiased estimate of G(ej ).
1 ,
N
N () is an
G
()
(12.85)
GsN () =
R
|UN ()|2
d
W
(
)
s
e ()
Ws is a function that is centered around zero and is symmetric. It usually
includes a parameter that determines the width of the smoothing window
and therefore the trade-off between bias and variance. Larger window
reduces variance, but increases bias and vice versa. Again, the variance
can be shown to decay as 1/N under a nonzero smoothing window.
12.3.2
23
X
i=1
Hi u(k i) + ek
(12.86)
X
i=1
(12.87)
!
!
!
N
N
N
X
X
X
1
1 X
1
Hi
y(k)uT (k ) =
u(k i)uT (k ) +
e(k)uT (k )
N
N
N
i=1
k=1
k=1
k=1
(12.88)
Assuming the signals are quasi-stationary, we can take the limit of the above
as N to obtain
Ryu ( ) =
Hi Ruu ( i) + Reu ( )
(12.89)
N
1 X
y(k)uT (k )
N N
(12.90)
i=1
where
Ryu ( ) =
lim
k=1
N
1 X
Ruu ( ) = lim
u(k)uT (k )
N N
(12.91)
1
N
(12.92)
Reu ( ) =
lim
k=1
N
X
k=1
e(k)uT (k )
The above equation can also be derived from a statistical argument. More
specifically, we can take expectation of (12.87) to obtain
T
E{y(k)u (k )} =
X
i=1
Assuming {u(k)} and {e(k)} are stationary sequences, Ruu , Ryu and Reu are
simply the expectations in the above.
Now, let us assume that {u(k)} is a zero-mean stationary sequence that
is uncorrelated with {e(k)}. {e(k)} is assumed to be also stationary. Then,
Reu ( ) 0 as N . Let us also assume that Hi = 0 for i > n. Such n can
be determined by examining Ryu ( ) under a white noise perturbation. When
the input signal is white, Ruu (i) = 0 except i = 0. From the above, it is clear
24
Identification-DRAFT
uu
uu (0)
H1 H2 H n
..
..
.
.
Ruu (n 1)
Ruu (n 1)
..
..
.
.
Ruu (n + 1) Ruu (n + 2)
Ruu (0)
(12.94)
Taking transpose of the above equation and rearranging it gives
1 T
T
Ryu (1)
Ruu (0)
Ruu (1)
Ruu (n 1)
H1
T
H T Ruu (1)
Ruu (0)
Ruu (n 2)
Ryu (2)
2
..
..
..
..
..
..
.
.
.
.
.
.
T
T
Ruu (n + 1) Ruu (n + 2)
Ruu (0)
Hn
Ryu (n)
(12.95)
T (i) = R (i)
Here we used the fact that Ruu
uu
In practice, one has to approximate Ruu (i) and Ryu (i) with finite-length
data. With such approximations, parameter variance can be significant. However, because we limited the number of impulse response coefficients to n by
assuming Hi = 0, i > n, we only need estimates for {Ruu (i), i = 0, , n 1}
and {Ryu (i), i = 1, , n}. Hence, the variance decays as 1/N (assuming
the matrix remains nonsingular). However, some bias results because of
the truncation. Again, the choice of n determines the trade-off between the
variance and the bias.
It is noteworthy that (12.95) gives virtually the same estimate as the least
squares identification when the number of data points is large. (See Exercise???)
12.4
Subspace Identification
In identifying a multivariable system, it is advantageous to use a model structure that includes cross-correlation among the outputs. Not only can this
improve identification of the deterministic part, the multivariable noise model
so identified can be useful in certain applications, e.g., those that require crosschannel feedback updates for inferential control. To build a model with crosscorrelation information, however, MIMO identification (rather than SISO or
MISO identification) is needed. Polynomial models like a MIMO ARMAX
model are generally difficult to work with in this context, since they give
rise to numerically ill-conditioned, nonlinear estimation problems with many
local minima. In addition, significant prior knowledge (e.g., system order, observability indices) is needed to obtain a proper model parameterization. An
25
12.4.1
(12.96)
where w(k) and (k) are white noise sequences. The usual assumptions of
reachability and observability as well as stationarity will be enforced here.
The following steady-state Kalman filter equation is equivalent to the above
system in an input-output sense and is called the innovation form of (12.96):
x
(k + 1) = A
x (k) + Bu(k) + K (k)
y(k) = C x
(k) + (k)
(12.97)
In the above, x
(k + 1) and x
(k) represent consecutive state estimates from
the steady-state Kalman filter, K the Kalman gain, and (k) the corresponding innovation sequence, which is white. The system equation we identify
in subspace identification is
x
i+1 (k + 1) = A
xi (k) + Bu(k) + Ki i (k)
y(k) = C x
i (k) + i (k),
(12.98)
where x
i+1 (k+1) and x
i (k) represent consecutive estimates from a non-steadystate Kalman filter that was started at time k i with the initial estimate of
x
0 (k i) = 0 and a certain covariance matrix (specifically, the systems openloop steady-state covariance matrix). The subscripts []i+1 and []i are added
to denote the number of time steps the Kalman filter has been running. i
should be chosen larger than state dimension n (i.e., choose one larger than
an upper-bound of n).
26
Identification-DRAFT
(12.98) serves as an approximation to (12.97). Note that the system matrices for the deterministic part of (12.98) (i.e., A, B, C) are same as those
appearing in (12.97). The same is not true for the stochastic parts, but
with a reasonably large i, Ki and Cov{i (k)}) would be good approximations of K and Cov{ (k)}). The goal is to identify system matrices,
(A, B, C, Ki , Cov{i (k)}), within some state coordinate transformation.
Multi-Step Prediction Equation Based On The Kalman Estimate
As a first step toward this goal, we note that
y(k)
C
y(k + 1)
CA
.
2
..
CA
=
(k)
+
..
.
CAi1
y(k + i 1)
0 0
..
.
CB
0
0 0
..
..
..
.
.
CAB
CB
.
..
..
..
..
..
.
.
.
.
.
i2
CA B CB 0
e(k|k 1)
u(k)
e(k + 1|k 1)
u(k + 1)
..
(12.99)
.
+
..
..
.
u(k + i 1)
e(k + i 1|k 1)
0
where the first two terms together represent the optimal predictions for y(k), , y(k+
i 1) based on the Kalman estimate x
i (k) and e(k|k 1), , e(k + i 1|k 1)
the corresponding prediction error. For the convenience of notation, let us
denote the above as
i (k) + L3 Ui0+ (k) + Ei0+ (k|k 1)
Yi0+ (k) = Wio x
(12.100)
u(k i)
y(k i)
..
..
(12.101)
Yi (k) =
,
; Ui (k) =
.
.
u(k 1)
y(k 1)
we can express
x
i (k) =
H1 H2
Yi (k)
Ui (k)
(12.102)
where H1 and H2 are functions of the system matrices as well as the covariance
matrices (including the initial covariance matrix chosen for the nonstationary
27
Yi0+ (k)
Wio
| h
H1 H2
{z
}
i
L1 L2
Yi (k)
Ui (k)
0+
the covariance matrix are allowed to depend on Ui (k) and Ui (k) (instead
of being chosen as the systems open-loop covariance matrix). This way, the
consistency of the least squares estimate can be retained despite the relaxation
of the assumption. However, the relationships between matrices L1 , L2 and L3
and system matrices change, thus complicating the subsequent development.
A detailed treatment of this generalization is beyond the scope of this chapter.
We list the relevant references at the end of the chapter.
2
28
Identification-DRAFT
U0|i1 = U (i) U (i + 1) U (i + j 1)
u(0)
u(1)
u(2)
u(j 1)
u(1)
u(2)
u(3)
u(j)
..
..
.
u(i 1) u(i) u(i + 1) u(i + j 2)
Ui|2i1 = U 0+ (i) U 0+ (i + 1) U 0+ (i + j 1)
u(i)
u(i + 1) u(i + 2) u(i + j 1)
u(i + 1) u(i + 2) u(i + 3)
u(i + j)
=
..
..
.
Y0|i1
u(2i 1)
u(2i)
u(2i + 1) u(2i + j 2)
= Y (i) Y (i + 1) Y (i + j 1)
y(0)
y(1)
y(2)
y(j 1)
y(1)
y(2)
y(3)
y(j)
..
..
Yi|2i1 = Y (i) Y 0+ (i + 1) Y 0+ (i + j 1)
y(i)
y(i + 1) y(i + 2) y(i + j 1)
..
..
.
y(2i 1)
y(2i)
y(2i + 1) y(2i + j 2)
(12.104)
where j = N 2i + 1. Then, the least squares estimate for [L1 L2 L3 ] can be
written as
, Y
0|i1
1 L
2 L
3 = Yi|2i1 U0|i1
(12.105)
L
Ui|2i1
Choosing The System Order and Obtaining The Kalman State Data
Matrix for x
i+1 - The Theory
Since [L1 L2 ] = Wio [H1 H2 ] with i > n, if the system is reachable / observable, [L1 L2 ] has the rank of n. Hence, one can obtain the system order
simply by examining its rank. This means we are required to store only n
linear combinations of Yi (k) and Ui (k) (which defines the Kalman state),
but we are free to choose any coordinate system within such a state space
29
(recall that system matrices can only be identified within a state coordinate
transformation). Suppose that the SVD of the matrix is given as
0
V1
L1 L2 = W1 W2
(12.106)
0 0
V2T
Y (k)
o
(12.107)
x
i (k) = (Wi ) L1 L2
U (k)
Y (k)
1/2 T
= V1
(12.108)
U (k)
Using the above equation, we can easily create a data matrix for the Kalman
estimate x
i (k) from input-output data:
i =
x
i (i) x
i (N i)
X
Y0|i1
(12.109)
1 L
2
= (Wio ) L
U0|i1
Choosing The System Order and Obtaining The Kalman State Data
Matrix x
i - The Practice
In practice, one would be given only a finite data set and hence the rank of
2 ], a least squares estimate for [L1 L2 ], can be significantly higher than
1 L
[L
n because of various identification errors. A logical way to choose the system
order and define the state space is to use the SVD of the matrix,
Y0|i1
.
(12.110)
L1 L2
U0|i1
o can be defined in a similar manner as before.
Based on the above SVD, W
i
The system order can be determined by examining the singular values and
finding a large gap between two successive values a somewhat subjective
decision that involves a trade-off between variance and bias. The state data
matrix can be obtained by using the same formula of (12.109).
From the viewpoint of model reduction, choosing the reduced state this way
amounts to minimizing the squared sum of prediction error vector Ei0+ (k|k
1), k = i + 1, , N + 1 i for given data. It bears a strong resemblance to the
Hankel-norm model reduction with a frequency weighting given by the input
spectrum. This way of choosing the state is particular to the algorithm called
N4SID. Various subspace identification methods differ in the way of choosing
the system order and the state. For example, in performing the SVD to decide
30
Identification-DRAFT
on the system order and the state coordinate system, one may use instead just
the gain matrix
1 L
2
L
or the normalized matrix
Cov
1/2
{Yi0+ (k)}
1 L
2
L
Cov
1/2
Yi (k)
Ui (k)
(k + 1)
same procedure as before but with Yi (k) and Ui (k) replaced by Yi+1
y(k i)
u(k i)
..
..
.
.
Yi+1
(k + 1) =
(k + 1) =
(12.111)
; Ui+1
y(k 1)
u(k 1)
y(k)
u(k)
Also define
0+
(k + 1) =
Yi1
y(k + 1)
..
.
y(k + i 1)
x
i+1 (k + 1) =
This yields
0+
Yi1
(k+1)
o
Wi1
| h
1 H
2
H
{z
}
i
L1 L2
0+
(k + 1) =
Ui1
2
1 H
H
Yi+1
(k + 1)
Ui+1 (k + 1)
u(k + 1)
..
.
u(k + i 1)
Yi+1
(k + 1)
Ui+1 (k + 1)
(12.112)
(12.113)
3 U 0+ (k+1)+E 0+ (k+1|k)
+L
i1
i1
,
Y0|i
i
h
L
L
U0|i
= Yi+1|2i1
L
3
2
1
Ui+1|2i1
(12.114)
(12.115)
31
where the data matrices, Yi+1|2i1 , Y0|i , etc. are defined in the same manner
as (12.105).
The state vector can then be
i+1 =
x
i+1 (i + 1)
X
h
o ) L
= (W
1
i1
created by using
x
i+1 (N i + 1)
i Y
0|i
L
2
U0|i
(12.116)
o is the same W
o used to create x
where W
i but with the last ny rows
i1
i
eliminated.
Obtaining the System Matrices
Once data matrices for x
i (k) and x
i+1 (k + 1) are created according to the
above procedure, system matrices A, B, and C can be estimated using the
linear least squares method. In addition, the covariance matrix for i (k) and
Ki can be estimated on the basis of the residual from the least squares.
Note that the system equation (12.98) in terms of the data matrices can
be written as
i+1
i
A B
K
X
X
=
+
Ei
(12.117)
C 0
I
Yi|i
Ui|i
where
Ei =
(12.118)
We can obtain the system matrices A, B and C using the least squares as
before. From the residual, we can also calculate the estimates of the innovation
gain matrix K and the covariance matrix R (of i (k)) according to
= 1 Ei E T
R
i
j
1
T
1
K =
j 1 2 R
(12.119)
32
Identification-DRAFT
The procedure as we described, though attractive from a conceptual viewpoint, does not represent the most efficient and stable implementation from a
numerical viewpoint. The procedure can be implemented in a computationally
efficient and stable way by employing QR (or RQ) factorization. For details,
see the references given at the end of the chapter.
Properties and Other Issues
The subspace identification method we just described has the following properties:
The resulting deterministic system model is asymptotically unbiased.
The estimates for the covariance matrices are biased, however, due to the
fact that (12.98) is a non-steady-state Kalman filter. The approximation
error diminishes as i .
Both results follow straightforwardly from the consistency of the least squares
estimation.
Advantages of the subspace identification method over the prediction error
method include:
Structure identification is automatic. It avoids having to try models of
different orders or the need for a special parameterization that requires
difficult-to-obtain prior information like the observability indices.
It yields a stochastic system model in the form of a state estimator
without any nonlinear optimization, which is needed, for instance, to
identify a MIMO ARMAX model.
Model reduction is already built into the identification.
However, there are some drawbacks as well. Although the method yields
an asymptotically unbiased model, little can be said about the model quality
obtained with finite data. In practice, one must work with finite-length data
sets. In addition, various non-ideal factors like nonlinearity and nonstationarity make the residual sequence e(k + `|k), ` = 0, , i 1 in (12.99) become
correlated with the regression data. Because of these reasons, L 1 , L2 obtained
from the least squares identification (which are critical for determining the
system order and generating data for the Kalman estimates) may contain significant errors, both variances and biases. Variance will dominate when the
upper-bound of the state order (i) is set too high. Although the variances
of the prediction matrices may be quantifiable, it is difficult to say how they
affect the final model quality measured in terms of prediction error, frequency response error, etc. One implication is that, in general, one needs a
33
large amount of data in order to have success with these algorithms, which is
only natural since these algorithms use very little prior knowledge. Another
implication is that the above does not replace the traditional prediction error
method but complements it. For instance, it has been suggested that the subspace method be used to provide a starting estimate for the prediction error
minimization.
Another related issue is that, if the input and output data are correlated
due to a feedback of some sort, the above algorithm can fail. In this case, the
prediction error e(k + `|k), ` = 0, , i 1 in (12.99) become correlated with
the input vector Ui0+ (k) and the consistency of the least squares estimation
breaks down. Methods have been suggested to overcome this problem. For
example, it has been suggested to replace the data for Y 0+ (k) with one step
ahead predictions obtained using a high-order ARX model, in order to break
the correlation. See the section Bibliography at the end for the reference.
Finally, the requirement that the stochastic part of the system be stationary should not be overlooked. If the nonstationarity arises mainly from the
processs exhibiting mean changes due to integrating type disturbances, one
can difference the input output data before applying the algorithm. Further
low-pass filtering may be necessary in order not to over-emphasize the high
frequency fitting. (Recall the discussion on the frequency-domain bias distribution.)
Example 12.3 Stochastic system model. Identification Result.
Example 12.4 Closed-loop data. Subspace ID exhibiting bias. Ljungs method.
12.5
12.5.1
Experiment Design
In most cases, data used for system identification must be generated by performing a test on an actual process. These tests are oftentimes costly and
34
Identification-DRAFT
time-consuming to run as they involve direct interaction with the actual process. For applications in the process industry, it is estimated that up to 90%
of cost and time involved in commissioning a model-based controller can be
attributed to the plant testing step. It is therefore important to understand
the underlying issues and optimize these tests as much as possible. Despite the
importance, the science of designing experiments for system identification has
not fully matured yet. Hence, we will resort to some qualitative discussions
on the key decisions.
Sample Time: Sample time for data collection is usually preset by the
existing hardware (e.g., data acquisition systems, actuators, and sensors). In deciding the input manipulation frequency, however, one needs
to keep in mind the theoretical limit given by the Sampling Theorem
(See Appendix??): With the sample time of Ts , no control can be
done beyond the Nyquist frequency of /Ts . On the other hand, too
small a sample period for a slow system can lead to excessively many
parameters (in the case of FIR identification) or numerical difficulties
(all poles collapsing around (1,0)).
Experiment Length: This too is largely decided by the practical consideration: The longer the experiment, the more the information but the
higher the cost. On the other hand, it is convenient to have an estimate
for the minimum length needed to obtain data for a desired closed-loop
performance, ahead of the actual experiment. We showed earlier that
calculation of the variance and its distribution in the context of least
squares estimation can be carried out under certain assumptions. Consideration of these quantities could provide some useful clues.
Test Signal: This probably represents the most important design parameter as one is often given significant degrees of freedom in its choice
and it also has the biggest impact on the model quality. Test signal (i.e.,
how it distributes a given power among different frequency channels and
possibly among different directions in the input space) determines the
distribution of model error (variance and bias) and therefore achievable
closed-loop performance. Popular signals used in plant testing are (series of) steps, pulses, random signals (e.g., random binary sequences),
and pseudo-random binary sequence (PRBS) signals. The last two are
recommended by theoreticians as they have high power contents and
distribute the power evenly through all frequencies (within the Nyquist
band). We will discuss the PRBS signal in more detail in the next subsection. However, practical constraints often dictate what signals can
be implemented and what cant. The consideration for the operational
impact that a test perturbation is expected to have on the process tends
to override any other consideration including the quality of information
it produces.
35
Single Input Testing vs. Multiple Input Testing: One also has
to decide whether to vary each input one at a time or several inputs together at the same time. Generally, the one-input-at-a-time approach
is safer as its impact on the process is more predictable and controllable.
However, varying the inputs altogether yields more information for a
given amount of time. In addition, for highly interactive systems (with
strong gain directionality characteristics), it is very difficult to obtain
the necessary multivariable information using single input testings (see
the example below). The single input testing inherently emphasizes
the accuracy of SISO models, which may not be enough. If multiple
input testing is used, both the auto-correlation and the cross-correlation
functions of the test signals should be optimized in the design.
Open-Loop Testing vs. Closed-Loop Testing: Finally, experiment
can be conducted in open loop or in closed loop with some feedback
controller in place. This is illustrated in Figure 12.1. In principle, closedloop testing would give a more relevant information as it more closely
simulates the situation the model will be subjected to. On the other
hand, it may not always be possible to implement a reasonable feedback
controller before model identification. If the closed-loop testing option is
chosen, one needs to be cognizant of two issues. First, in all likelihood,
naturally occurring disturbances alone will not be enough to produce
data with necessary information contents. For instance, it can be seen
from Figure 12.1(b) that, without an external excitation signal, one could
easily end up identifying (the inverse of) the controller rather than the
process. Second, the feedback loop introduces correlation between past
output and future input in the data. Some provision may be necessary as
most conventional algorithms give biased results in such a case (see the
discussion on the convergence of linear least squares in Section 12.2.3,
for example). For example, one could identify a closed-loop function
between the external dither signal and the output and then extract an
open-loop model from it.
Example 12.5 ADD THE EXAMPLE OF HIGH PURITY DISTILLATION
COLUMN. Single-input testing vs. multiple-input testing. Demonstrate the
importance of correlated design.
12.5.2
PRBS Signals
PRBS is a popular choice for the test signal because it is easy to generate,
convenient to implement, and guarantees a specific power spectrum even for a
relatively short-length signal (compared to a random signal). In this subsection, we examine its properties and design in more detail.
36
Identification-DRAFT
37
PRBS (denoted with hereafter) is a periodic binary signal taking some upper value (say b) and some lower value (b). Its covariance function R ( )
(calculated assuming the input is a period signal of period M ) is:
2
B for = 0
(12.120)
R ( ) =
2
BM for 6= 0
R( )exp(j )
= B 2 + 2B 2 cos(M ) + 2B 2 cos(2M ) +
The above power spectrum is zero except at the discrete frequency points of
i = i 2
M . The power at these frequency points are all the same but infinite.
This makes sense since the integral of the power spectrum over (0, 2) must be
R(0) (which is the power of the signal) and the power spectrum is nonzero only
at discrete points. For periodic signals, it is convenient to use the following
DFT of the covariance function:
(i ) =
M
1
X
=0
R( )exp(ji ) R(0) = B 2
Hence, we see that a PRBS distributes its power uniformly over the discrete
frequency points within the Nyquist band.
Note: The power distribution will not be exactly uniform since the DFT of
the covariance function shows the distribution of the power when the discrete
signal is implemented as an impulse train. In reality, you would implement it
as a zero-order-hold signal.)
2
An advantage is that, being a deterministic signal, the statistical property
holds even for a relatively short-length signal. For example, with a PRBS
signal of length N that is an integer-multiple of M ,
2
N
B for = 0
1 X
(k)(k ) =
(12.121)
2
N
BM for 6= 0
i=1
For purely random signals, in order for the above property to hold approximately, N must be quite large. Since one is often limited to collecting just
short-length data in practice, the above property of PRBS can be very useful
38
Identification-DRAFT
01 = 1
00 = 0
Using this method, we can obtain a periodic binary sequence of length 2Ns
1, where Ns denotes the number of shift registers. From the (0, 1) binary
sequence, a PRBS signal can be generated by choosing the lower value for 0
and the upper value for 1. Once we obtain a signal for one period, we can
simply repeat the signal for how many ever periods needed. The procedure
for PRBS generation can be summarized as follows:
1. Initialize the register values by setting them either 1 or 0. Different
initializations lead to different signals.
2. Perform the binary summation between the first register value and the
final register value.
3. Shift all register values to the right direction by one register.
4. Enter the summation result into the first register.
5. Repeat step 1 step 4 until 2Ns 1 data points are obtained for each
register. One can choose the data history of any register as the PRBS.
Generation of Multiple Independent PRBS Signals
If it is desired to do a multiple-input test (by varying several inputs simultaneoulsy), one may wish to design several PRBSs that are mutually independent.
There are different ways to do this. Let us discuss the two signal design case.
39
u1 =
, u2 =
1
2
u1 =
, u2 =
2
1
The above also gives a two-dimensional signal of period 2M .
The latter option is expected to be more robust when only a part of the basic
signal is used for some unexpected reason. In such a case, with Option 1, one
may end up implementing just the portion of the test signal that excites the
input channels in only a single direction of (1,1).
One can easily extend the above idea to larger dimensional cases. For
example, for an nu input channel design, we can create a PRBS of period
nu M . Then, we can choose
1
2
2
..
u1 = . u2 = . , etc.
.
.
nu
nu
1
What happens to the cross-variance between inputs with the above design?
All cross-variance terms become B 2 /M . So by choosing M as a sufficiently
large number, the cross correlation between input signals can be made negligible. As before, it is recommended that M be chosen to correspond to
the systems maximum settling time (with respect to all the inputs). This
is because the number of identifiable parameters with the above design are
M nu .
Bias and Variance of PRBS
The key parameters one has to choose in the PRBS generation is the upper
np . Note
value b, the lower value b, the period M and the number of periods
40
Identification-DRAFT
high signal-to-noise ratio which should help the identification in most cases.
However, as with everything, too much can be bad since it can accentuate
the nonlinear characteristics of input output responses and adversely affect
the on-going plant operation as well. Finding a reasonable compromise would
require some intimate knowledge about the process.
Period and Number of Repetitions
Note that M np represents the total duration of the experiment. If M is chosen too small, the power does not get distributed to enough frequency points.
In addition, the value B 2 /M may be significant resulting in a covariance function that does not closely match that of white noise. Note that the PRBS
signal is persistently exciting of order M . Hence, it can be used to identify
only M parameters. If an FIR model is to be identified, it implies that M
should be chosen to be equal to or larger than the number of sample steps
corresponding to the systems settling time. In certain cases, it may be undesirable to distribute the power to too many frequency points. M should be
limited in this case but np can always be increased to produce a signal of a
desired length.
To Vary the Bias Value or Not To Vary It?
In some cases, it may be advantageous to vary the bias value from one period
to next. This is true when the steady-state value of the input cannot be fixed
because it changes according to some load disturbance. If an identification
experiment is performed for a long time, the load disturbance may change
even during the experiment. This case requires a test input with its power
density concentrated more in the low frequency range, which can be achieved
by varying the bias value. One can still use the same basic PRBS design, but
an extra bias (chosen to be different for each period) is added to the design.
12.5.3
Data Pre-Processing
(12.122)
41
Option 2 Use the given steady-state values of the variables instead to compute the deviation variables, i.e.,
y(k) = [yp (k) yss ],
where yss and uss represent a priori given steady-state values of the
process output and input respectively.
Option 3 Difference the data by subtracting from each datum its value at
the previous sample time:
y(k) = y(k) y(k 1),
(12.123)
(12.124)
where H1 (q) represents the known (or given) part of the noise model and
H2 (q, ) the unknown. In this case, one can write
1
1
y(k) = G(q, )
u(k) + H2 (q, )(k)
H1 (q)
H1 (q)
(12.125)
Hence, the input output data can be prefiltered with 1/H1 (q) to put the
model structure in the standard form of ARX, ARMAX, OE, etc.
Note that, by electing to detrend by Option 3, one implicitly assumes
that H(q) includes an integrator. A reasonable form of noise model in
Ao (q)
many process control problems is H1 (q) = 1q
1 where Ao (q) is some
stable polynomial. In the absence of better knowledge, one can choose
Ao (q) = (1 q 1 ), where < 1 is chosen to correspond to the desired closed-loop time constant. In this case, the prefiltering should be
1
done using the filter 1q
Ao (q) . This amounts to filtering the differenced
input output data with a low-pass filter Ao1(q) , which makes sense in
view of the fact that differencing of the data makes high-frequency fit
over-emphasized.
42
Identification-DRAFT
Sometimes, one would like to minimize a filtered prediction error, L(q)e(i),
instead. This is often done to emphasize or de-emphasize the fit of a certain frequency range over the others. In this case, one can simply prefilter
the input output data with L(q) before applying the PEM to obtain the
desired effect.
12.5.4
43
by B(q)/A(q) of the specified order, if the noise part of the model cannot
be explained by 1/A(q), both parts will show bias (which is distributed
between the two according to the signal to noise ratio). In this case,
increasing the order of A(q) may improve the result but only to a point
since this will also introduce more parameters and increase the parameter
variance. If one starts seeing near pole-zero cancellations between A(q)
and B(q) but significant auto-correlation still exists in the residual, it is
a good bet that the noise part of the system is undermodelled. In this
case, using an ARMAX structure can help reduce the error and lower
the system order.
The bias problem is not expected for the FIR identification and ETFE.
However, complex noise patterns can still pose a problem in terms of the
convergence rate. If the noise patterns can be modeled a priori, it helps
to pre-filter the data before performing the identification.
A similar statement holds for subspace identification. If the maximum
order is chosen too small compared to the real system order, it can give
a bias.
Large Parameter Variances If the amount of data used are insufficient
for given noise level and number of parameters, parameter variance can
be very large. In this case, the residuals from the model fitting may look
good but the prediction performance of the model with validation data
may be poor. In addition, there may not be good agreements among
different model types. The variance problem is expected to be most
severe for the FIR identification and ETFE, but it can also show up
in an ARX model identification if the order chosen is very high (in an
attempt to explain the noise part). Subspace identification too can suffer
from this if the selected model order (both the upper bound and the final
order chosen on the basis of the SVD) is very high.
Feedback Within Data If there are feedbacks present within the data
(e.g., due to an existing control system), it can introduce a bias to the
model. The bias will manifest itself as the models poor prediction performance with validation data. This problem is expected to be most severe for the FIR Identification, subspace identification and ETFE. The
problem should be less severe for the PEM with an ARX structure if the
chosen ARX structure is sufficiently rich.
Nonlinearity If the nonlinearity is severe, it will hamper the identification even when the object is to obtain the most accurate linear system
model. Rather than directly proceeding to complex nonlinear model
structures (e.g., neural nets), it may be advantageous to consider the
underlying physics and see if there are simple variable transformations
(e.g., taking square roots or logarithms) that make sense.
44
Identification-DRAFT
45
poor noise modeling even though the convergence rate can still be affected by
it.
Example 12.6 Take the reader through a simple example?
12.5.5
Model Quality Assessment and An Integrated Framework for System Identification and Controller Design
46
Identification-DRAFT
Figure 12.3: A Framework for Integrated Identification, Model Quality Assessment, and Controller Design
47
Examples to Include
1. Demonstrate a loss of identifiability in a MIMO ARMAX system.
2. Example for Pseudo-Linear Regression vs. Nonlinear Optimization.
3. Example for two input signals giving different information matrix (wellconditioned vs. ill-conditioned). Do Monte-Carlo simulation and compute the average of the error with respect to different linear directions.
4. Bayesian estimator vs. least squares in FIR identification.
5. Example of instrumental variable method.
6. subspace identification applied to some simple but realistic system. Number of data points.
7. Subspace identification applied to closed-loop data. Ljungs method.
8. Multi-input testing vs. single input testing. High-purity distillation
column.
9. Effect of detrending. Integrated white noise disturbance or unknown
bias.
10. Going through some simple example (with significant noise) as discussed
in the Model-Fitting section.
48
Identification-DRAFT
6. Derive the Bayesian estimator for system (12.68) using the Kalman filter
theory.
7. Show that x
i (k) in the subspace identification is a linear function of
y(k 1), , y(k i) and u(k 1), , u(k i). Derive the explicit
relationship (expressions of H1 and H2 ).
Bibliography
1. Detailed treatment of parametric identification methods for various model
parameterizations can be found in Ljung (1987) and Soderstrom (REF).
2. PEM - developed mostly by Ljung and coworkers.
3. Frequency domain bias distribution. the result derived here is asymptotic
in nature, a similar result also exists for finite data sets (Ljung, 1987).
(CHECK!)
4. Instrumental Variable method references. Popular choices of instruments
(Table?? in Ljung).
5. PCR, PLS references.
6. ETFE - mostly drawn from Ljung. etails can be found there. Smoothing
functions (Table??).
7. The subspace identification method originated from the classical realization theory presented in Ho and Kalman (CITE) and King (CITE).
The subspace identification algorithm that we described is called N4SID
and was originally proposed by Van Overschee and De Moor (REF).
The reference describes the modifications when the input is not white
but correlated in time. Also, RQ factorization based numerically stable
and efficient algorithm..... Other subspace algorithms available in the
literature are based on similar principles, but differ on how the data for
x
i (k) and x
i+1 (k + 1) are constructed. For instance, in Larimores CVA
method [?], the basis for the states are chosen by considering the statis
T
tical correlation between Yi0+ (k) and Yi (k)T Ui (k)T
. Another
notable algorithm in this breed is MOESP by Verhaegen (CITE). These
algorithms (and their variants) have already appeared in commercial
software packages (e.g., MATLAB System Identification Toolbox) and
also been embedded into some of the identification packages used with
MPC.
8. Closed-loop subspace ID methods. Ljung. Chou and Verhaegen.....
9. PRBS and Signal design. Reference the book.
49
10. Importance of multi-input testing and design method for highly interactive systems. Koung and MacGregor, Cooley and Lee, etc.