Professional Documents
Culture Documents
L. S T E T T N E R (Warszawa)
The following lemma gives the most general formula for πn . Its proof,
unlike those in [5] and [8], which have more restrictive hypotheses, is not
based on the reference probability method.
where
r(z, y) E P v (z1 , dz) ν(dz1 )
R R
v A
(5) M (y, ν)(A) = R R
E
r(z, y) E P v (z1 , dz) ν(dz1 )
for v ∈ U , ν ∈ P(E) and F : P(E) → R bounded measurable.
P r o o f. By (1) we easily obtain
E[F (πn+1 ) | Yn ]
= E[F (M u(πn ) (yn+1 , πn )) | Yn ]
= E[E[F (M u(πn ) (yn+1 , πn )) | Yn , xn+1 ] | Yn ]
h R i
=E F (M u(πn ) (y, πn ))r(xn+1 , y) dy Yn
Rd
h R i
=E E[F (M u(πn ) (y, πn ))r(xn+1 , y) | Yn , xn ] dy Yn
Rd
h R R i
u(πn ) u(πn )
=E F (M (y, πn ))r(z, y)P (xn , dz) dy Yn
Rd E
R R R
= F (M u(πn ) (y, πn )) r(z, y) P u(πn ) (z1 , dz) πn (dz1 ) dy
E Rd E
= Π u(πn ) (πn , F ) .
Thus (πn ) is Markov with transition operator of the form (4).
28 L. Stettner
There exists a measurable selector uβ : P(E) → (U, U) for which the infimum
on the right hand side of (12) is attained. Moreover , we have
(13) ϑβ (µ) = Jµβ ((uβ (πn ))) .
In addition, ϑβ can be uniformly approximated from below by the sequence
ϑβ0 (µ) ≡ 0 ,
h i
(14)
R
ϑβn+1 (µ) = inf c(x, a) µ(dx) + βΠ a (µ, ϑβn ) ,
a∈U
E
and each ϑβn is concave, i.e. for µ, ν ∈ P(E) and α ∈ [0, 1],
(15) ϑβn (αµ + (1 − α)ν) ≥ αϑβn (µ) + (1 − α)ϑβn (ν).
P r o o f. We only point out the main steps since the proof is more or less
standard (for details see [4], Thm. 2.2).
Define, for ϑ ∈ C(P(E)),
h R i
T ϑ(µ) = inf c(x, a) µ(dx) + βΠ a (µ, ϑ) .
a∈U
E
the Bellman equation (12). Since by (A1) and (H2) the map
R
U 3a→ c(x, a) µ(dx) + βΠ a (µ, ϑβ )
E
as n → ∞, for ϕ ∈ C(E) .
Since U × P(E) is compact, to prove (H1) and (H2) it is sufficient to show
that
U × P(E) 3 (a, µ) → Π a (µ, F ) is continuous for F ∈ C(P(E)).
Partially observed Markov processes 31
= In + II n .
From (19), II n → 0, provided
R
(22) E3z→ F (M a (y, µ))r(z, y) dy ∈ C(E).
Rd
then clearly In → 0.
By (A4), for each ε > 0 there exists a compact set K ⊂ Rd such that for
any z ∈ E,
ε
(24) R(z, K c ) < .
2kF k
Therefore
R
an a
(F (M (y, µ )) − F (M (y, µ)))r(z, y) dy
n
Rd
R
≤ |F (M an (y, µn )) − F (M a (y, µ))|r(z, y) dy + ε
K
We have
Proposition 2. Under (A5) and the assumptions of Theorem 1, the
functions wβ (ν) are uniformly bounded for β ∈ (0, 1), ν ∈ P(E).
P r o o f. We improve the proof of Theorem 2 of [6]. Namely, we show
by induction the uniform boundedness of wnβ (ν) for ν ∈ P(E), β ∈ (0, 1),
n = 0, 1, . . . For n = 0, w0β (ν) ≡ 0.
Assume that for any β ∈ (0, 1), ν ∈ P(E), wnβ (ν) ≤ L where L ≥ kckλ−2 .
Partially observed Markov processes 33
For y ∈ Rd , define
0
m(y)(B) = M a (y, µn+1
β )(B) − λ2 M a (y, ν)(B)
for any B ∈ E.
By (29) we have
0
R R R R
r(z, y) P a (z1 , dz) µn+1
β (dz1 ) ≥ λ r(z, y) P a (z1 , dz) ν(dz1 )
B E B E
a
R R a
= λM (y, ν)(B) r(z, y) P (z1 , dz) ν(dz1 )
E E
0
R R
≥ λ2 M a (y, ν)(B) r(z, y) P a (z1 , dz) µn+1
β (dz1 )
E E
R R 0
× P a (z1 , dz) ν(dz1 ) − λ2 P a (z1 , dz) µn+1
β (dz 1 )
E E
2
R R 2 −1
− β(1 − λ ) ϑβn ((1 −λ ) m(y))r(z, y) dy
E R d
0
R
× P a (z1 , dz)µn+1
β (dz1 )
E
34 L. Stettner
R R
= kck + β (ϑβn (M a (y, µ)) − ϑβn (µnβ ))r(z, y) dy
E Rd
R R 0
× P a (z1 , dz) ν(dz1 ) − λ2 P a (z1 , dz) µn+1
β (dz 1 )
E E
R R
− β(1 − λ2 ) (ϑβn ((1 − λ2 )−1 m(y)) − ϑβn (µnβ ))r(z, y) dy
E R d
0
R
× P a (z1 , dz) µn+1
β (dz1 )
E
R R 0
≤ kck + βL var P a (z1 , · ) ν(dz1 ) − λ2 P a (z1 , · ) µn+1
β (dz 1 .
)
E E
Thus
β
(34) wn+1 (ν) ≤ kck + βL(1 − λ2 ) ≤ L
and the bound L is independent of ν ∈ P(E), β ∈ (0, 1). By induction
wnβ (ν) ≤ L for any ν ∈ P(E), n = 0, 1, . . . , β ∈ (0, 1). Since by the very
definition wnβ (ν) ≥ 0, and for each β, wnβ (ν) → wβ (ν) as n → ∞, we finally
obtain wβ (ν) ≤ L for ν ∈ P(E) and β ∈ (0, 1).
R e m a r k 2. One can easily see that in the case of a finite state space
E, the assumption
0
P a (z 0 , x)
(35) (A50 ) inf inf inf >0
0
z,z ∈E a,a ∈U x∈E, P 0 a (z,x)>0 P a (z, x)
with
def
R
P a (µ, C) = P a (x, C)µ(dx) .
E
We have
From (40),
R
β β
(42) w (ν) − w (µ) ≤ sup c(x, a)(ν(dx) − µ(dx))
a∈U E
n R R
+ β sup wβ (M a (y, ν))r(z, y) dy (P a (ν, dz) − λ2 P a (µ, dz))
a∈U E Rd
R R o
2 β a β a a
+ (λ w (M (y, ν)) − w (M (y, µ)))r(z, y) dy P (µ, dz)
E Rd
= I + II + III .
Now
(43) II ≤ 2kwβ k sup sup |P a (ν, B) − λ2 P a (µ, B)|
a∈U B∈E
β
= 2kw k(1 − λ(µ, ν)λ(ν, µ))
and using (41) and the nonnegativity of wβ we have
(44) III
R R
≤ sup (λ2 − 1)wβ ((1 − λ2 )−1 ma (y, µ, ν))r(z, y) dy P a (µ, dz) ≤ 0 .
a∈U E
Rd
n n n n
P n
P s = (s1 , . . . , sN ) → s = (s1 , . . . , sN ), 0 ≤ si ≤ 1, 0 ≤ si ≤ 1,
for si = 1,
si = 1, and this is satisfied since
XN X N X N
n a
sup (s − s )P (i, k) ≤ |sni − si | → 0 as sn → s .
i i
a∈U i=1 i=1
k=1
a
g a (z, x)η(dx) for C ∈ E and that
R
R e m a r k 6. Assume P (z, C) = C
the mapping
(47) U × E × E 3 (a, z, x) → g a (z, x) is continuous.
Then (A6) is satisfied.
In fact, by the Stone–Weierstrass theorem we can approximate
Pk g a uni-
formly on U ×E×E by continuous functions of the form i=1 bi (a)ci (z)di (x)
and
sup sup |P a (µn , C) − P a (µ, C)|
a∈U C∈E
R R a
≤ sup g (z, x)(µn (dz) − µ(dz))η(dx)
a∈U E E
k
X R R
≤ε+ sup |bi (a)| |di (x)|η(dx) ci (z)(µn (dz) − µ(dz)) → ε
i=1 a∈U E E
as n → ∞.
Now we can prove our main result:
Theorem 2. Assume (A1)–(A6). Then there exist w ∈ C(P(E)) and a
constant γ which are solutions to the Bellman equation
h R i
(48) w(µ) + γ = inf c(x, a) µ(dx) + Π a (µ, w) .
a∈U
E
Moreover , there exists u : P(E) → U for which the infimum on the right
hand side of (48) is attained. The strategy an = u(πn ) is optimal for Jµ and
(49) Jµ ((u(πn ))) = γ .
P r o o f. By Theorem 1, each ϑβn is concave. Therefore wnβ is concave
and wβ as limit of wnβ is also concave. Since by Proposition 2, the wβ are
uniformly bounded, and by Proposition 3 equicontinuous, from the Ascoli–
Arzelà theorem the family wβ , β ∈ (0, 1), is relatively compact in C(P(E)).
Moreover, |(1 − β)ϑβ (µβ )| ≤ kck. Therefore one can choose a subsequence
βk → 1 such that
(1 − βk )ϑβk (µβk ) → γ
and
wβk → w in C(P(E)) as k → ∞.
38 L. Stettner
References
LUKASZ STETTNER
INSTITUTE OF MATHEMATICS
POLISH ACADEMY OF SCIENCES
P.O. BOX 137
00-950 WARSZAWA, POLAND
E-mail: STETTNER@IMPAN.IMPAN.GOV.PL
Received on 22.6.1992