L.STETTNER (Warszawa) : Applicationes Mathematicae 22,1 (1993), Pp. 25-38

APPLICATIONES MATHEMATICAE
22,1 (1993), pp. 25–38
L. S T E T T N E R (Warszawa)
ERGODIC CONTROL OF PARTIALLY OBSERVED

MARKOV PROCESSES WITH EQUIVALENT
TRANSITION PROBABILITIES
Abstract. Optimal control with long run average cost functional of a

partially observed Markov process is considered. Under the assumption
that the transition probabilities are equivalent, the existence of the solution
to the Bellman equation is shown, with the use of which optimal strategies
are constructed.
1. Introduction. Let (Ω, F, P ) be a probability space and (xn ) a

discrete time controlled Markov process on a compact state space E, en-
dowed with the Borel σ-field E, with transition kernel P v (x, dz) for v ∈ U ,
where (U, U) is a compact space of control parameters. Assume the only
observations of xn are Rd -valued random variables y1 , . . . , yn such that for
Yn = σ{y1 , . . . , yn } we have
R
(1) P {yn+1 ∈ A | xn+1 , Yn } = P {yn+1 ∈ A | xn+1 } = r(xn+1 , y) dy
A
for n = 0, 1, . . . with r : E×Rd → R+ a measurable function, and A ∈ B(Rd ),

the family of Borel subsets of Rd .
The Markov process (xn ) is controlled by a sequence (an ) of Yn -measur-
able U -valued random variables. The best mean square approximation of xn
based on the available observation is given by a filtering process πn , defined
as a measure valued process such that for A ∈ E,
(2) πn (A) = P {xn ∈ A | Yn } for n = 1, 2, . . . ,
and
π0 (A) = µ(A)
where µ is the initial law of (xn ).
1991 Mathematics Subject Classification: Primary 93E20; Secondary 93E11.

Key words and phrases: stochastic control, partial observation, long run average cost,
Bellman equation.
26 L. Stettner
The following lemma gives the most general formula for πn . Its proof,
unlike those in [5] and [8], which have more restrictive hypotheses, is not
based on the reference probability method.
Lemma 1. Under (1), for n = 0, 1, . . . and A ∈ E we have
r(z2 , yn+1 ) E P an (z1 , dz2 ) πn (dz1 )

R R
A
(3) πn+1 (A) = R R .
E
r(z2 , yn+1 ) E P an (z1 , dz2 ) πn (dz1 )
P r o o f. Denote the right hand side of (3) by M an (yn+1 , πn )(A). Let

F : (Rd )n → R be a bounded measurable function, Yn = (y1 , . . . , yn ) and
C ∈ B(Rd ).
By (1), Fubini’s theorem and properties of conditional expectations we
have
R
M an (yn+1 , πn )(A)χC (yn+1 )F (Yn ) dP
Ω
R
= E[M an (yn+1 , πn )(A)χC (yn+1 ) | xn+1 , Yn ]F (Yn ) dP
Ω
R R
= M an (y, πn )(A)r(xn+1 , y) dy F (Yn ) dP
Ω C
R R
= M an (y, πn )(A)E[E[r(xn+1 , y) | Yn , xn ] | Yn ] dy F (Yn ) dP
Ω C
R R h R i
= M an (y, πn )(A)E r(z, y) P an (xn , dz) Yn dy F (Yn ) dP

Ω C E
R R R R
= M an (y, πn )(A) r(z, y) P an (z1 , dz) πn (dz1 ) dy F (Yn ) dP
Ω C E E
R RR R
= r(z2 , y) P an (z1 , dz2 ) πn (dz1 ) dy F (Yn ) dP
Ω C A E
R RRR
= r(z2 , y) dy P an (z1 , dz2 ) πn (dz1 ) F (Yn ) dP
Ω E A C
R h RR i
= E r(z2 , y) dy P an (xn , dz2 ) Yn F (Yn ) dP

Ω A C
R h h R i i
= E E r(xn+1 , y) dy χA (xn+1 ) Yn , xn Yn F (Yn ) dP

Ω C
R R
= r(xn+1 , y) dy χA (xn+1 )F (Yn ) dP
Ω C
Partially observed Markov processes 27
R
= E[χC (yn+1 ) | Yn , xn+1 ]χA (xn+1 )F (Yn ) dP
Ω
R R
= χC (yn+1 )χA (xn+1 )F (Yn ) dP = πn+1 (A)χC (yn+1 )F (Yn ) dP.
Ω Ω
Therefore, by the definition of conditional expectation, (3) follows.

The class of controls an = u(πn ), where u is a fixed measurable, U -valued
function, is of special interest. Namely, we have
Lemma 2. Under (1), if additionally an = u(πn ) with u a fixed measur-
able function from the space P(E) of probability measures on E, endowed
with the topology of weak convergence, into (U, U), then πn is a Yn -Markov
process with transition operator
R R R
(4) Π u(ν) (ν, F ) = F (M u(ν) (y, ν))r(z, y) dy P u(ν) (z1 , dz) ν(dz1 )
E Rd E
where
r(z, y) E P v (z1 , dz) ν(dz1 )
R R
v A
(5) M (y, ν)(A) = R R
E
r(z, y) E P v (z1 , dz) ν(dz1 )
for v ∈ U , ν ∈ P(E) and F : P(E) → R bounded measurable.
P r o o f. By (1) we easily obtain
E[F (πn+1 ) | Yn ]
= E[F (M u(πn ) (yn+1 , πn )) | Yn ]
= E[E[F (M u(πn ) (yn+1 , πn )) | Yn , xn+1 ] | Yn ]
h R i
=E F (M u(πn ) (y, πn ))r(xn+1 , y) dy Yn

Rd
h R i
=E E[F (M u(πn ) (y, πn ))r(xn+1 , y) | Yn , xn ] dy Yn

Rd
h R R i
u(πn ) u(πn )
=E F (M (y, πn ))r(z, y)P (xn , dz) dy Yn

Rd E
R R R
= F (M u(πn ) (y, πn )) r(z, y) P u(πn ) (z1 , dz) πn (dz1 ) dy
E Rd E
= Π u(πn ) (πn , F ) .
Thus (πn ) is Markov with transition operator of the form (4).
28 L. Stettner
In this paper we are interested in minimizing the following long run

average cost functional:
n n−1
X o
−1
(6) Jµ ((an )) = lim sup n Eµ c(xi , ai )
n→∞
i=0
over all U -valued, Yn -adapted processes an , with c : E × U → R+ a given
bounded measurable cost function.
By the very definition of a filtering process we have
n n−1
X R o
(7) Jµ ((an )) = lim sup n−1 Eµ c(z, ai ) πi (dz) .
n→∞
i=0 E
The optimal strategies for the cost functional Jµ are constructed with
the use of a suitable Bellman equation, the solution of which is found as
a limit of wβ (x) = ϑβ (x) − inf z∈E ϑβ (z) as β → 1, where ϑβ is the value
function of the β-discounted cost functional. Since our limit results are
based on compactness arguments, obtained via the Ascoli–Arzelà theorem,
in Section 2 we show the continuity of ϑβ . Then in Section 3 we prove the
uniform boundedness of wβ . Using the concavity of wβ , obtained from the
concavity of ϑβ , proved in Section 2, we get equicontinuity of wβ , which
allows us to use the Ascoli–Arzelà theorem.
The discrete time ergodic optimal control problem with partial observa-
tion was studied in [1], [2], [3], [6], [8], [9]. In [1] and [8] the observation
was corrupted with white noise. In addition, in [1] there was a finite state
space and a rich observation structure. In [8] the state space was general
but there were some restrictions on controls. The papers [2] and [3] con-
tain a general theory but the fundamental example used is a very simple
maintenance-replacement model.
In [6] a model with a finite state space and almost steady state transition
probabilities was studied. Finally, finite state space semi-Markov decision
processes with a completely observable state were considered in [9]. Our
paper generalizes [6] in various directions. Namely, we have a general, com-
pact state space. Although the techniques to show the boundedness and the
equicontinuity of wβ follow in some sense the arguments of [6], by a more
detailed estimation we obtain the results under the assumptions which are
much less restrictive than the corresponding ones in [6], even when E is finite.
2. Discounted control problem. In this section we characterize the

value function ϑβ of the discounted cost functional Jµβ defined as follows:
nX∞ o ∞
nX o
def
β i
R
(8) Jµ ((an )) = Eµ β c(xi , ai ) = Eµ βi c(z, ai ) πi (dz)
i=0 i=0 E
with β ∈ (0, 1).

The theorem below provides a complete solution to the discounted par-

tially observed control problem.
Theorem 1. Assume (1) and
(A1) c : E × U → R+ is continuous,
(H1) for F ∈ C(P(E)), the space of continuous functions on P(E), if
µn ⇒ µ, i.e. µn converges weakly in P(E) to µ, we have
(9) sup |Π a (µn , F ) − Π a (µ, F )| → 0 as n → ∞,
a∈U
(H2) for F ∈ C(P(E)), if U 3 an → a we have

(10) Π an (µ, F ) → Π a (µ, F ).
Then
def
(11) ϑβ (µ) = inf Jµβ ((an ))
(an )
is a continuous function of µ ∈ P(E) and is a unique solution to the Bellman

equation
h R i
(12) ϑβ (µ) = inf c(x, a) µ(dx) + βΠ a (µ, ϑβ ) .
a∈U
E
There exists a measurable selector uβ : P(E) → (U, U) for which the infimum
on the right hand side of (12) is attained. Moreover , we have
(13) ϑβ (µ) = Jµβ ((uβ (πn ))) .
In addition, ϑβ can be uniformly approximated from below by the sequence
ϑβ0 (µ) ≡ 0 ,
h i
(14)
R
ϑβn+1 (µ) = inf c(x, a) µ(dx) + βΠ a (µ, ϑβn ) ,
a∈U
E
and each ϑβn is concave, i.e. for µ, ν ∈ P(E) and α ∈ [0, 1],
(15) ϑβn (αµ + (1 − α)ν) ≥ αϑβn (µ) + (1 − α)ϑβn (ν).
P r o o f. We only point out the main steps since the proof is more or less
standard (for details see [4], Thm. 2.2).
Define, for ϑ ∈ C(P(E)),
h R i
T ϑ(µ) = inf c(x, a) µ(dx) + βΠ a (µ, ϑ) .
a∈U
E
By (A1) and (H1), T is a contraction on C(P(E)). Thus, by the Banach

principle there is a unique fixed point ϑβ of T , which is a unique solution to
30 L. Stettner
the Bellman equation (12). Since by (A1) and (H2) the map
R
U 3a→ c(x, a) µ(dx) + βΠ a (µ, ϑβ )
E
is continuous, there exists a measurable selector uβ . The identity (13) is then

almost immediate. Since T is monotonic and contractive, ϑβn is increasing
and converges to ϑβ . It remains to show the concavity of ϑβn . We prove
this by induction. Clearly, ϑβ0 ≡ 0 is concave. Provided ϑβn is concave, by
Jensen’s lemma we have for α ∈ (0, 1),
Π a (αµ + (1 − α)ν, ϑβn ) ≥ αΠ a (µ, ϑβn ) + (1 − α)Π a (ν, ϑβ )
and therefore from (14),
ϑβn+1 (αµ + (1 − α)ν) ≥ αϑβn+1 (µ) + (1 − α)ϑβn+1 (ν) ,
i.e. ϑβn+1 is concave. By induction, ϑβn is concave for each n. The proof of
the theorem is complete.
Below we formulate sufficient conditions for (H1) and (H2).
Proposition 1. Assume
(A2) r ∈ C(E × Rd ),
(A3) for fixed a ∈ U , P a (x, · ) is Feller , i.e. for any ϕ ∈ C(E), if xn ⇒ x
we have
(16) P a (xn , ϕ) → P a (x, ϕ),
(H3) if U 3 an → a, then for each ϕ ∈ C(E),
(17) sup |P an (x, ϕ) − P a (x, ϕ)| → 0,
x∈E
def
r(z, y)ψ(y) dy where ψ ∈ C(Rd ), if E 3 zn → z
R
(A4) for R(z, ψ) = Rd
we have
(18) R(zn , · ) ⇒ R(z, · ).
Then (H1) and (H2) are satisfied.
P r o o f. Notice first that from (16) and (17), if U 3 an → a and µn ⇒ µ,
we have
def
R
(19) P an (µn , ϕ) = P an (x, ϕ) µn (dx) → P a (µ, ϕ)
E
as n → ∞, for ϕ ∈ C(E) .
Since U × P(E) is compact, to prove (H1) and (H2) it is sufficient to show
that
U × P(E) 3 (a, µ) → Π a (µ, F ) is continuous for F ∈ C(P(E)).
Therefore we shall show that

(20) Π an (µn , F ) → Π a (µ, F )
for U 3 an → a, P(E) 3 µn ⇒ µ and F ∈ C(P(E)). We have
(21) |Π an (µn , F ) − Π a (µ, F )|
R R
≤ (F (M an (y, µn )) − F (M a (y, µ)))r(z, y) dy P an (µn , dz)

E Rd
R R
+ F (M a (y, µ))r(z, y) dy (P an (µn , dz) − P a (µ, dz))

E Rd
= In + II n .
From (19), II n → 0, provided
R
(22) E3z→ F (M a (y, µ))r(z, y) dy ∈ C(E).
Rd
By (A4), Rd 3 y → M a (y, µ) ∈ P(E) is continuous. Then, again by (A4),

the map (22) is continuous, and consequently II n → 0.
If
R
(23) sup (F (M an (y, µn )) − F (M a (y, µ)))r(z, y) dy → 0

z∈E
Rd
then clearly In → 0.
By (A4), for each ε > 0 there exists a compact set K ⊂ Rd such that for
any z ∈ E,
ε
(24) R(z, K c ) < .
2kF k
Therefore
R
an a
(F (M (y, µ )) − F (M (y, µ)))r(z, y) dy

n
Rd
R
≤ |F (M an (y, µn )) − F (M a (y, µ))|r(z, y) dy + ε
K
and to obtain (23) it remains to show that

(25) M an (y, µn )(ϕ) → M a (y, µ)(ϕ)
for any ϕ ∈ C(E), uniformly in y ∈ K.
Using the Stone–Weierstrass approximation theorem (see [7], Thm. 9.28,
cf. also the proof of Lemma A.1.2 of [8]) and (19), we obtain
32 L. Stettner
R R R R
r(z, y)ϕ(z) P an (z1 , dz) µn (dz1 ) − r(z, y)ϕ(z) P a (z1 , dz) µ(dz1 )

E E E E
R
≤ r(z, y)ϕ(z)(P an (µn , dz) − P a (µ, dz)) → 0

E
uniformly in y ∈ K. Thus, we have uniform convergence of the numerators

and denominators in the formula defining M an , and consequently conver-
gence of the ratios from which (25) follows.
The proof of Proposition 1 is complete.
R e m a r k 1. (A4) is satisfied when supz∈E r(z, y) is integrable.
Define
(26) wβ (ν) = ϑβ (ν) − ϑβ (µβ ) and wnβ (ν) = ϑβn (ν) − ϑβn (µnβ )
where µβ = arg min ϑβ and µnβ = arg min ϑβn . Clearly, wβ is a solution to
the equation
h R i
(27) wβ (ν) + (1 − β)ϑβ (µβ ) = inf c(x, a) ν(dx) + βΠ a (ν, wβ )
a∈U
E
and wnβ (ν) → wβ (ν) uniformly in ν ∈ P(E). We would like to let β ↑ 1

in (27) and thus obtain a solution w(ν) to the long run average Bellman
equation
h R i
(28) w(ν) + γ = inf c(x, a) ν(dx) + Π a (ν, w) .
a∈U
E
Since we wish to apply the Ascoli–Arzelà theorem, we have to show the

boundedness and the equicontinuity of wβ for β ∈ (0, 1), which are studied
successively in the next sections.
3. Boundedness of wβ . We make the following assumption:

0
P a (z 0 , C) def
(29) (A5) inf inf inf = λ > 0.
z,z 0 ∈E a,a0 ∈U C∈E, P a (z,C)>0 P a (z, C)
We have
Proposition 2. Under (A5) and the assumptions of Theorem 1, the
functions wβ (ν) are uniformly bounded for β ∈ (0, 1), ν ∈ P(E).
P r o o f. We improve the proof of Theorem 2 of [6]. Namely, we show
by induction the uniform boundedness of wnβ (ν) for ν ∈ P(E), β ∈ (0, 1),
n = 0, 1, . . . For n = 0, w0β (ν) ≡ 0.
Assume that for any β ∈ (0, 1), ν ∈ P(E), wnβ (ν) ≤ L where L ≥ kckλ−2 .
Let a, a0 ∈ U be such that for fixed ν ∈ P(E),

β
R R
(30) wn+1 (ν) = c(x, a) ν(dx) − c(x, a0 ) µn+1
β (dx)
E E
0
+ β[Π a (ν, ϑβn ) − Π a (µn+1
β , ϑβn )] .
For y ∈ Rd , define
0
m(y)(B) = M a (y, µn+1
β )(B) − λ2 M a (y, ν)(B)
for any B ∈ E.
By (29) we have
0
R R R R
r(z, y) P a (z1 , dz) µn+1
β (dz1 ) ≥ λ r(z, y) P a (z1 , dz) ν(dz1 )
B E B E
a
R R a
= λM (y, ν)(B) r(z, y) P (z1 , dz) ν(dz1 )
E E
0
R R
≥ λ2 M a (y, ν)(B) r(z, y) P a (z1 , dz) µn+1
β (dz1 )
E E
and therefore m(y)(B) ≥ 0 for B ∈ E.

If λ = 1 we have a stationary, noncontrolled Markov chain with P a (z, C)
= η(C) for any a ∈ U , z ∈ E and some fixed η ∈ P(E), and consequently
wnβ ≡ 0 for any n = 0, 1, . . . Therefore we restrict ourselves to the case λ < 1.
Then (1 − λ2 )−1 m(y) ∈ P(E). Since
0
M a (y, µn+1
β ) = λ2 M a (y, ν) + (1 − λ2 )[(1 − λ2 )−1 m(y)] ,
by concavity of ϑβn we obtain

0
(31) ϑβn (M a (y, µn+1
β )) ≥ λ2 ϑβn (M a (y, ν)) + (1 − λ2 )ϑβn ((1 − λ2 )−1 m(y))
and from (30) we have
(32) wβn+1 (ν)
R R
≤ kck + β ϑβn (M a (y, µ))r(z, y) dy
E R d
R R 0

× P a (z1 , dz) ν(dz1 ) − λ2 P a (z1 , dz) µn+1
β (dz 1 )
E E
2
R R 2 −1
− β(1 − λ ) ϑβn ((1 −λ ) m(y))r(z, y) dy
E R d
0
R
× P a (z1 , dz)µn+1
β (dz1 )
E
34 L. Stettner
R R
= kck + β (ϑβn (M a (y, µ)) − ϑβn (µnβ ))r(z, y) dy
E Rd
R R 0

× P a (z1 , dz) ν(dz1 ) − λ2 P a (z1 , dz) µn+1
β (dz 1 )
E E
R R
− β(1 − λ2 ) (ϑβn ((1 − λ2 )−1 m(y)) − ϑβn (µnβ ))r(z, y) dy
E R d
0
R
× P a (z1 , dz) µn+1
β (dz1 )
E
R R 0

≤ kck + βL var P a (z1 , · ) ν(dz1 ) − λ2 P a (z1 , · ) µn+1
β (dz 1 .
)
E E
By (A5) for any B ∈ E,

0
R R
(33) P a (z1 , B) ν(dz1 ) ≥ λ2 P a (z1 , B) µn+1
β (dz1 ) .
E E
Thus
β
(34) wn+1 (ν) ≤ kck + βL(1 − λ2 ) ≤ L
and the bound L is independent of ν ∈ P(E), β ∈ (0, 1). By induction
wnβ (ν) ≤ L for any ν ∈ P(E), n = 0, 1, . . . , β ∈ (0, 1). Since by the very
definition wnβ (ν) ≥ 0, and for each β, wnβ (ν) → wβ (ν) as n → ∞, we finally
obtain wβ (ν) ≤ L for ν ∈ P(E) and β ∈ (0, 1).
R e m a r k 2. One can easily see that in the case of a finite state space
E, the assumption
0
P a (z 0 , x)
(35) (A50 ) inf inf inf >0
0
z,z ∈E a,a ∈U x∈E, P 0 a (z,x)>0 P a (z, x)
also implies the boundedness of wβ . Thus Proposition 2 significantly im-

proves Theorem 2 of [6]. This was possible because of the choice of µnβ in
(32) as the argument of minimum of ϑβn .
R e m a r k 3. Assumption (A5) says that the transition probabilities for
different controls and initial states are mutually equivalent, with Radon–
Nikodym density
R a bounded away from 0. In particular, in the case when
a
P (z, C) = C g (z, x) η(dx) the assumption
0
g a (z 0 , x)
(36) inf inf inf >0
z,z 0 ∈E a,a0 ∈U x∈E, g a (z,x)>0 g a (z, x)
is sufficient for (A5) to be satisfied.

4. Main theorem. Before we formulate and prove our main result, we

show the equicontinuity of wβ for β ∈ (0, 1). For this purpose we need an
extra assumption:
(A6) If P(E) 3 µn ⇒ µ ∈ P(E) then
sup sup |P a (µn , C) − P a (µ, C)| → 0
a∈U C∈E
with
def
R
P a (µ, C) = P a (x, C)µ(dx) .
E
We have
Proposition 3. Under (A5), (A6) and the assumptions of Theorem 1,

the family of functions wβ , β ∈ (0, 1), is equicontinuous, i.e.
(37) ∀ε>0 ∃δ>0 ∀µ,µ0 ∈P(E) %(µ, µ0 ) < δ ⇒ ∀β∈(0,1) |wβ (µ) − wβ (µ0 )| < ε
with % standing for a metric compatible with the weak convergence topology
of P(E).
P r o o f. For ν, µ ∈ P(E) let

def P a (ν, C)
(38) λ(ν, µ) = inf inf .
a
a∈U C∈E,P (µ,C)>0 P a (µ, C)
From (A5) and (A6), if ν ⇒ µ, then
(39) λ(ν, µ) → 1 and λ(µ, ν) → 1.
By (27) for ν, µ ∈ P(E) we have
(40) wβ (ν) − wβ (µ)
R
≤ sup c(x, a)(ν(dx) − µ(dx)) + β sup (Π a (ν, wβ ) − Π a (µ, wβ )).

a∈U E a∈U
By analogy with the proof of Proposition 2 define

ma (y, µ, ν)(B) = M a (y, µ)(B) − λ(µ, ν)λ(ν, µ)M a (y, ν)(B)
for B ∈ E.
Clearly, ma (y, µ, ν)(B) ≥ 0 for B ∈ E, and λ(µ, ν)λ(ν, µ) ≤ 1.
If λ(µ, ν)λ(ν, µ) = 1, then wβ ≡ 0 for β ∈ (0, 1), and consequently the
equicontinuity property is satisfied. Therefore assume λ2 = λ(µ, ν)λ(ν, µ)
< 1. Then by the concavity of wβ ,
(41) wβ (M a (y, µ)) ≥ λ2 wβ (M a (y, ν))+(1−λ2 )wβ ((1−λ2 )−1 ma (y, µ, ν)) .
36 L. Stettner
From (40),
R
β β
(42) w (ν) − w (µ) ≤ sup c(x, a)(ν(dx) − µ(dx))

a∈U E
n R R
+ β sup wβ (M a (y, ν))r(z, y) dy (P a (ν, dz) − λ2 P a (µ, dz))
a∈U E Rd
R R o
2 β a β a a
+ (λ w (M (y, ν)) − w (M (y, µ)))r(z, y) dy P (µ, dz)
E Rd
= I + II + III .
Now
(43) II ≤ 2kwβ k sup sup |P a (ν, B) − λ2 P a (µ, B)|
a∈U B∈E
β
= 2kw k(1 − λ(µ, ν)λ(ν, µ))
and using (41) and the nonnegativity of wβ we have
(44) III
R R
≤ sup (λ2 − 1)wβ ((1 − λ2 )−1 ma (y, µ, ν))r(z, y) dy P a (µ, dz) ≤ 0 .
a∈U E
Rd
Interchanging ν and µ in (40)–(44) we obtain the same estimates and there-

fore
R
β β
(45) |w (ν) − w (µ)| ≤ sup c(x, a)(ν(dx) − µ(dx))

a∈U E
+ 2kwβ k(1 − λ(µ, ν)λ(ν, µ)) .

Since by the Stone–Weierstrass theorem (Thm. 9.28 of [7]) c(x, a) can
be
Pr uniformly approximated on E × U by continuous functions of the form
i=1 ci (x)di (a), from (39) we obtain
lim sup |wβ (ν) − wβ (µ)| = 0 .

ν⇒µ β∈(0,1)
Let us comment on the assumption (A6):

R e m a r k 4. (H3) clearly follows from (A6).
R e m a r k 5. In the case of a finite state space E = {1, . . . , N }, (A6)
can be written as
N X
X N
(46) sup (sni − si )P a (i, k) → 0

a∈U i=1
k=1
n n n n
P n
P s = (s1 , . . . , sN ) → s = (s1 , . . . , sN ), 0 ≤ si ≤ 1, 0 ≤ si ≤ 1,
for si = 1,
si = 1, and this is satisfied since
XN X N X N
n a
sup (s − s )P (i, k) ≤ |sni − si | → 0 as sn → s .

i i
a∈U i=1 i=1
k=1
a
g a (z, x)η(dx) for C ∈ E and that
R
R e m a r k 6. Assume P (z, C) = C
the mapping
(47) U × E × E 3 (a, z, x) → g a (z, x) is continuous.
Then (A6) is satisfied.
In fact, by the Stone–Weierstrass theorem we can approximate
Pk g a uni-
formly on U ×E×E by continuous functions of the form i=1 bi (a)ci (z)di (x)
and
sup sup |P a (µn , C) − P a (µ, C)|
a∈U C∈E
R R a
≤ sup g (z, x)(µn (dz) − µ(dz))η(dx)

a∈U E E
k
X R R
≤ε+ sup |bi (a)| |di (x)|η(dx) ci (z)(µn (dz) − µ(dz)) → ε

i=1 a∈U E E
as n → ∞.
Now we can prove our main result:
Theorem 2. Assume (A1)–(A6). Then there exist w ∈ C(P(E)) and a
constant γ which are solutions to the Bellman equation
h R i
(48) w(µ) + γ = inf c(x, a) µ(dx) + Π a (µ, w) .
a∈U
E
Moreover , there exists u : P(E) → U for which the infimum on the right
hand side of (48) is attained. The strategy an = u(πn ) is optimal for Jµ and
(49) Jµ ((u(πn ))) = γ .
P r o o f. By Theorem 1, each ϑβn is concave. Therefore wnβ is concave
and wβ as limit of wnβ is also concave. Since by Proposition 2, the wβ are
uniformly bounded, and by Proposition 3 equicontinuous, from the Ascoli–
Arzelà theorem the family wβ , β ∈ (0, 1), is relatively compact in C(P(E)).
Moreover, |(1 − β)ϑβ (µβ )| ≤ kck. Therefore one can choose a subsequence
βk → 1 such that
(1 − βk )ϑβk (µβk ) → γ
and
wβk → w in C(P(E)) as k → ∞.
38 L. Stettner
Letting βk → 1 in (27) we obtain (48).

The remaining assertion of the theorem follows easily from Theorem 3.2.2
of [4].
References
[1] G. B. D i M a s i and L. S t e t t n e r, On adaptive control of a partially observed Markov

chain, Applicationes Math., to appear.
[2] E. F e r n a n d e z - G a u c h e r a n d, A. A r a p o s t a t i s and S. J. M a r c u s, Adaptive con-
trol of a partially observed controlled Markov chain, in: Stochastic Theory and Adap-
tive Control, T. E. Duncan and B. Pasik-Duncan (eds.), Lecture Notes in Control
and Inform. Sci. 184, Springer, 1992, 161–171.
[3] —, —, —, On partially observable Markov decision processes with an average cost
criterion, Proc. 28 CDC, Tampa, Florida, 1989, 1267–1272.
[4] O. H e r n a n d e z - L e r m a, Adaptive Markov Control Processes, Springer, New York,
1989.
[5] H. K o r e z l i o g l u and G. M a z z i o t t o, Estimation recursive en transmission nu-
merique, Proc. Neuvième Colloque sur le Traitement du Signal et ses Applications,
Nice, 1983.
[6] M. K u r a n o, On the existence of an optimal stationary J-policy in non-discounted
Markovian decision processes with incomplete state information, Bull. Math. Statist.
17 (1977), 75–81.
[7] H. L. R o y d e n, Real Analysis, Macmillan, New York, 1968.
[8] W. J. R u n g g a l d i e r and L. S t e t t n e r, Nearly optimal controls for stochastic ergodic
problems with partial observation, SIAM J. Control Optim. 31 (1993), 180–218.
[9] K. W a k u t a, Semi-Markov decision processes with incomplete state observation—
average cost criterion, J. Oper. Res. Soc. Japan 24 (1981), 95–108.
LUKASZ STETTNER
INSTITUTE OF MATHEMATICS
POLISH ACADEMY OF SCIENCES
P.O. BOX 137
00-950 WARSZAWA, POLAND
E-mail: STETTNER@IMPAN.IMPAN.GOV.PL
Received on 22.6.1992

L.STETTNER (Warszawa) : Applicationes Mathematicae 22,1 (1993), Pp. 25-38

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L.STETTNER (Warszawa) : Applicationes Mathematicae 22,1 (1993), Pp. 25-38

Uploaded by

Copyright:

Available Formats

APPLICATIONES MATHEMATICAE

22,1 (1993), pp. 25–38

ERGODIC CONTROL OF PARTIALLY OBSERVED

Abstract. Optimal control with long run average cost functional of a

1. Introduction. Let (Ω, F, P ) be a probability space and (xn ) a

for n = 0, 1, . . . with r : E×Rd → R+ a measurable function, and A ∈ B(Rd ),

1991 Mathematics Subject Classification: Primary 93E20; Secondary 93E11.

Lemma 1. Under (1), for n = 0, 1, . . . and A ∈ E we have

r(z2 , yn+1 ) E P an (z1 , dz2 ) πn (dz1 )

P r o o f. Denote the right hand side of (3) by M an (yn+1 , πn )(A). Let

Therefore, by the definition of conditional expectation, (3) follows.

In this paper we are interested in minimizing the following long run

2. Discounted control problem. In this section we characterize the

with β ∈ (0, 1).

The theorem below provides a complete solution to the discounted par-

(H2) for F ∈ C(P(E)), if U 3 an → a we have

is a continuous function of µ ∈ P(E) and is a unique solution to the Bellman

By (A1) and (H1), T is a contraction on C(P(E)). Thus, by the Banach

is continuous, there exists a measurable selector uβ . The identity (13) is then

Therefore we shall show that

By (A4), Rd 3 y → M a (y, µ) ∈ P(E) is continuous. Then, again by (A4),

and to obtain (23) it remains to show that

uniformly in y ∈ K. Thus, we have uniform convergence of the numerators

and wnβ (ν) → wβ (ν) uniformly in ν ∈ P(E). We would like to let β ↑ 1

Since we wish to apply the Ascoli–Arzelà theorem, we have to show the

3. Boundedness of wβ . We make the following assumption:

Let a, a0 ∈ U be such that for fixed ν ∈ P(E),

and therefore m(y)(B) ≥ 0 for B ∈ E.

by concavity of ϑβn we obtain

By (A5) for any B ∈ E,

also implies the boundedness of wβ . Thus Proposition 2 significantly im-

is sufficient for (A5) to be satisfied.

4. Main theorem. Before we formulate and prove our main result, we

Proposition 3. Under (A5), (A6) and the assumptions of Theorem 1,

P r o o f. For ν, µ ∈ P(E) let

By analogy with the proof of Proposition 2 define

Interchanging ν and µ in (40)–(44) we obtain the same estimates and there-

+ 2kwβ k(1 − λ(µ, ν)λ(ν, µ)) .

lim sup |wβ (ν) − wβ (µ)| = 0 .

Let us comment on the assumption (A6):

Letting βk → 1 in (27) we obtain (48).

[1] G. B. D i M a s i and L. S t e t t n e r, On adaptive control of a partially observed Markov

You might also like