Professional Documents
Culture Documents
E.N.S.I.G.C, Chemin de la loge , Toulouse Cedex, 31078, France Phone: Tel: +33 62 25 23 69
Fax: Tel: +33 62 25 23 69
Published online: 16 May 2007.
To cite this article: A. S. POZNYAK & K. NAJIM (1996) Learning automata with continuous input and changing number of
actions, International Journal of Systems Science, 27:12, 1467-1472, DOI: 10.1080/00207729608929353
To link to this article: http://dx.doi.org/10.1080/00207729608929353
International Journal of Systems Science, 1996, volume 27, number 12, pages 1467-1472
A. S.
POZNYAKt
and K. NAJIMt
Introduction
Learning automata have attracted considerable interest
due to their potential usefulness in a variety of engineering
problems which are characterized by nonlinearity and a
high level of uncertainty (Najim and Oppenheim 1991).
Learning automata have been initially used to model
biological systems (Wiener 1948, Walter 1953, Tseltin
1973). A learning system is connected in a feedback loop
to the environment or random medium (the input to one
is the output of the other) where it operates. The
environment is the system which communicates with the
learning automaton and supplies it with information. The
environment in which the automaton operates offers the
latter a finite set of actions. The automaton is constrained
to choose one of these actions on the basis of a probability
distribution. The outputs of the automaton form the
input to the environment and the reactions or responses
from the environment form the input to the automaton.
In adaptive control (Najim and M'Saad 1991), the
beha viour of the system is slightly improved at every
sampling period by estimating in real time the parameters
(model or control law parameters) to attain the desired
control objective. In learning automata (Narendra and
Thathachar 1989), the probability distribution p. is
I.
1468
RANDOM
Un
I--
ENVIRONMENT
LEARNING
I..--
AUTOMATON
{~n}, ~n E
where W
= 2N
I.
qn(j) = prob
v",
[v" =
V(j)!3';,-t],j = I, W,
(J
and
L:
j=
Pn(i) = I,
'In.
Pn+ 1
v,,),
Pn(i)
Pn*("/")
I J == Kn(J') ,
K n(J') ==
"L.,
i:u(i)eV(j)
(')
0
I > ,
(I)
(2)
1469
i.e.
a.s.
Pn(Cl)- I,
n~oo
Pn + t (i)
(3)
P:+ ,(i!j)Kn(j),
(4)
where
3.
l:
~,X(U, = u(i))
t= 1
l:
t
X(u, = u(i))
[X]+ ,=
n"Egn(u(Cl), w)!3";,-t}
a.s.
--+
11- ....00
0,
X(U n = u(i)) ,=
u 0,
(5)
i = I, ... , N,
sn(i)'=-----:n,---------'
X,
if X ~ 0,
0,
if X < 0,
{~:
if un
= u(i),
(6)
if un of- u(i),
Note that
~n
n-
00
L. 1'n
n=1
[0, I],
and
min { qn(j) }
1
N( ')
J-
= 00,
lim E{ ~n!un
= U(Cl)} = O.
un
= u(s) E v" =
V(j).
P:+ t(Cl!j)
The main result or this study is as follows.
Theorem 1:
e~'
{fJs. _ p*(Cl/J') + ~
n
[I - N(j)fJ,,]}.
N(j) _ I
(7)
]}
.
(8)
1470
Hence
E{p,,+I(a)/3";,_11\
v" =
V(m
v" =
L
VU}
x(u(a)
V(j))[K,,(j) - p.(n)]
j= J
s:u('~lGV(j)
L x(u(a) E V(j)) L
s:Lt.eVj
=L L
x(u(a)
j= 1 s= 1
NUl
p.,(s)
,,..
1'=1
v" =
V(j))}p:(s/j)
L p,,(S)r
N _ 2(S)
= 2N -
s= 1
x (I - N(j)D,.)].
(9)
N(j) - I
where
Observe that
NUl
K.,(j)
rN -
2(s)
sal
V(j))
=L
E{p,,+I(a)/3";,_11\
'~' p,,(a)
v" =
+ y"x(u(a) E
j~
(for W
V(j)}
v,,)
(I)
L -, -
NU )
[
= L
1'=1
(10)
L p,,(S)
,,..
s= 1
,,.. N())
I c.,(s)p.,(s) - c,,(a)p,,(a) ,
= J, r N -
C~-2
= 2N -
"Is
= J, ... , N, s #
a,
2(s)
I, "Is), and
N-2
(N - 2)!
j!(N - 2 _ j)!
(11)
represents the combinatorial function, we obtain
where
I, N.
x
By averaging over the probability distribution q" and the
index j, and taking into account that p" is 3";,-1measurable
= T(p"_,, ~"-l' U,,-I' v,,-I))' we obtain
t,
= I - p,,(a) - y"
q,,(j)x(u(a)
I
NU)
,
L c,,(s)p,,(s) - c,,(a)p,,(a) ] ,
[ N(j)-I,,..
y"c- min
j
q~(j)
N(j) - J
+ y"o(n-"),
(14)
V(j))
j= I
[I -
(12)
Corollary 1: If
q,,(j) = q(j),
y" =
y
-- ,
n+a
> y > 0,
c
y--<I,
n+a
L
j= 1
q,,(j)x(u(a) E V(j))
.-00 0,
nP(1 - p,,(a)) -
.-00 0, <
1471
Then
lim
n; a~. 0.
a.s.
n'<D.
------+
v S; p,
= J.I.)
can be
J.I.
* <----,
I
-<Y
c(1
+ a)
t'ln~
and
00
O<a<--1.
As
00
4.
== I - P.(Cl), Cl.
v.
=n,
<
"=1
J.I.
u.
1]*,
"-CO
fJ
= y.c,
= y.o(n-"),
P S; II < yc < I.
Conclusion
"\'
,,*
Appendix
00,
a.:!.
An -
n=1
+ fJ.,
fJ.;;:: 0,
00
00
L:
Cl.
L:
= 00,
n= 1
fJ. v. <
00,
n= 1
11m
V n + 1 - Vn
,= II < I
ctn Vn
n ..... oo
exists then
Un
= 0w
(I)
Let
u. be
a.s.
-+
0,
v.
-+ 00.
00
L:
E(O.) <
CI),
11=1
00
"\'
1 a.:!.
"n
- 00,
11=1
00
V
n
3<. CO,
n=1
+ V.+ I)",
+ O.({".} is a quasimartingale).
_
0;;
a.s. _
(v.+ 1 )
E(u.+ tI.'fI'.) S; u.(1 - Cl.) --;;;:-
-(I
= Un
+ v.+ IfJ.
)(v.+ v.
(Xn
1 -
I) +
Vn + I
fJ
tJ'
1472
E(ii"+
- oc"(l - /I
which is equivalent to
o
References
BOSH. R, R.. and MOSTELLER, F., 1958, Stochastic Models for Leurninu
(New York: Wiley).
NAJIM. K.. and MSAAD. M.. 1991. Adaptive control: theory and
practical aspects. Journal of Process Control, 1, 84-95.
NAJIM. K.. and OPPENHEIM, G" 1991, Learning systems: theory and
application, Proceedinqs a/tile Institution oj Electrical Engineers, Pt
E, 13K, IH3-ln,
NAJIM. K" and POZNYAK. A. S., 1994, Learning Automata: Theory and
Applicatiolls (Oxford, U,K,: Pergamon Press),