Professional Documents
Culture Documents
Abstract: A dispatching algorithm of elevator group control system is proposed based on reinforcement learning.
Elevator dispatching is modeled by a Markov Decision Process. Then an internally recurrent neural network based
reinforcement learning method is designed to find the optimal dispatching policy while the state-action value function is
iteratively approximated. Finally, several simulated experiments are done to compare the trained dispatching policy with
other traditional ones. The experimental results demonstrate the effectiveness of proposed dispatching method.
Key Words: Elevator Group Control, Dispatching Algorithm, Reinforcement Learning, Neural Network
MDP and define the state set, action set and immediate
1 INTRODUCTION reward. Then an internally recurrent neural network based
The dispatching problem in elevator group supervisory reinforcement learning method is designed to find the
control system has been investigated extensively due to its optimal dispatching policy while the state-action value
high practical significance. Some stochastic models such as function is iteratively approximated. The recurrent neural
Markov decision process (MDP) are used to model the network can increase the learning speed due to its memory
elevator group control problem[1][2]. Reinforcement of past input/output information. To balance the
learning as an approximate method of dynamic exploration and exploitation, Boltzmann distribution is
programming to solve a MDP problem has drawn more used to select an action in the action space. The algorithm is
attention of researchers in the field of artificial intelligence, trained by the data derived from simulated experience in a
control theory and operational research, for it has the ability virtual environment. Finally, under several different traffic
to learn the optimal policy from interaction with the flows, several simulated experiments are done to compare
environment. the learned policy with other traditional dispatching method.
The experimental results demonstrate the effectiveness of
In the literature, some results has been attained when
proposed dispatching method.
researchers use reinforcement learning method to design
supervisory control and optimization algorithm[3-7]. 2 PRELIMINARIES
Q-learning based elevator group control algorithm is
mostly discussed. For example, the reference [3] designs
2.1 Dispatching Problem of Elevator Group System
multiple agents with Q-learning ability to make a decision
that each elevator car should stops or not. When Q-learning The elevator group control system can be considered as a
is used to solve a large-scale, complex dynamic discrete event dynamic system. An elevator system
optimization problem, the value function approximation is schematic diagram is shown in Fig.1. Passengers arrive at
critical. Neural network is an effective solution to store the an elevator system randomly. When a passenger arrives at a
mapping from action/state pairs to values and find out the landing floor and gives a hall call, the group control system
optimal value function. Feedforward neural network is used allocates the call to the most suitable elevator. Each
popularly, such as BP neural network in the reference elevator control system handles functions of car running,
[3][4][5], and CMAC neural network in the reference [6]. stopping and door open, etc., based on the call allocation
In this paper we focus on the elevator dispatching message sent from the group controller. The allocation
problem where the objective is to allocate a car to serve a decision is made by optimizing a cost function. A number
new hall call instead of elevator group control to make a car of costs, such as call time, passenger waiting and journey
run or stop. Firstly, we model the dispatching problem as a times, car load factor, energy consumption, transportation
capacity, and number of starts, can be considered during the
call allocation.[8] So the dispatching problem is always
This work is supported by Hebei Province’s University Scientific
Research Program of under Grant z2012016 and Tianjin Education
modeled by a static or dynamic optimization problem. In
Commission Scientific Research Program under Grant 20120833
978-1-4673-5534-6/13/$31.00 2013
c IEEE 3397
this paper, we formulate the dispatching problem from the And the optimal action-value function
viewpoint of Q* ( s , u )
def
max QS ( s, u ) (4)
S
Q-value Rt Reward st
t
Updating Computing
Qt
D 0.2 and E 0.2 , J 0.9 the temperature T in Gibbs RL based method has better adaptability to different traffic
distribution is TK K
d T0 ( d 0.98 , T0 1000 ) flows than SZ and GA although it is not better than SZ in
TF1 and TF2 and a little worse than GA in TF3. RL method
5.2 Simulation Results is not customized to a specific traffic flow mode. It’s the
nature of learning in experience that make this dispatching
The learning process and simulated experiments are all algorithm can handle the varying, unknown traffic and thus
done in a virtual environment[9], as shown in Fig. 4. have a good average performance.
6 CONCLUSIONS
Elevator group systems have the characteristics of huge
state space and random passenger arrival which make the
dispatching problem not trivial. In this paper we try to
design the optimal policy in a learning way. Markov
decision process and reinforcement learning are suitable
models for this problem. Q-learning is used to find optimal
dispatching action while it approximates the action value
function of MDP. Internally recurrent neural networks are
introduced to store the value function. The recurrent neural
network can increase the learning speed due to its memory
of past input/output information. Several simulated
Fig 4. Elevator group system simulation software experiments verify the effectiveness of the learning-style
dispatching method in different modes of traffic flows. The
We use three different traffic flows for dispatching agent
method has better adaptability to varying traffic load than
training and comparison between the proposed method and
the other two.
another two classic ones. Data sets of these traffic flows are
shown in the Table 3. The three modes are pure up-peak REFERENCES
traffic (TF1, Table 3(a)), pure down-peak traffic (TF1,
Table 3(b)) and down peak with light up traffic (TF1, Table [1] D. L. Pepyne, C. G. Cassandras, Optimal Dispatching Control for
Elevator Systems during Uppeak Traffic, IEEE Transactions on
3(c)). After 20-time training in each traffic mode, the Control Systems Technology, Vol.5, No.6, 629–643,1997.
dispatching policy is nearly stable, that is, the dispatching [2] M. Brand, D. Nikovski, Optimal parking in group elevator control,
scheme that the algorithm makes will not change when it Proceedings of IEEE International Conference on Robotics and
runs in situations past learned. Automation, 1002-1008, 2004.
Then we use two classic methods, Static Zoning (SZ) and [3] R. H. Crites, A. G. Barto, Elevator group control using multiple
reinforcement learning agents. Machine Learning, Vol.33, No.2,
Genetic Algorithm based method[10] (GA), with the same
235-262, 1998.
traffic data to evaluate the performance of the [4] Z. L. Zong, X. G. Wang, Z. Tang, G. Z. Zeng, Elevator group control
reinforcement learning based dispatching technology. algorithm based on residual gradient and Q-learning, SICE 2004
Results of comparison are shown in Table 4. Three Annual Conference, Vol. 1, 329 – 331, 2004.
common performance indices are used, average waiting [5] Q. Zong, C. F. Song, G. S. Xing, A study of elevator dynamic
time, average journey time and average crowding. scheduling policy based on reinforcement learning, Elevator World,
Vol.1, 58-64, 2006.
According to the data in Table 4, it is illuminated that the
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Persons 18 11 15 8 8 34 46 40 34 46 9 10 6 6 9
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Persons 17 17 13 13 10 34 31 26 41 38 13 12 14 8 12
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Persons(Up) 0 1 2 2 0 2 2 0 2 0 1 2 1 0 0
Persons(Down) 7 10 7 9 12 10 8 8 10 18 11 6 12 2 15
[6] Y. Gao, J. K. Hu, B. N. Wang, D. L. Wang, Elevator group control International Conference on Industrial Electronics, Control,
using reinforcement learning with CMAC, Acta Electronica Sinica, Instrumentation and Automation, Vol.2, 795-800, 1992.
Vol.35, No.2, 362-365, 2007. [9] Q. Zong, Y. Z. He, L. J. Wei, Modeling and research for
[7] F. L. Zeng, Q. Zong, Z. Y. Sun and L. Q. Dou, Self-adaptive agent-oriented elevator group control simulation system, Journal of
multi-objective optimization method design based on agent System Simulation, Vol.18, No.5, 1391-1393, 2006.
reinforcement learning for elevator group control systems, [10] L. H. Xue, Fuzzy neural network based elevator group control
Proceedings of the 8th World Congress on Intelligent Control and method with genetic algorithm, Master Thesis, Tianjin University,
Automation, 2577-2582, 2010. 2002.
[8] A. Fujino, T. Tobita, K. Yoneda, An on-line tuning method for
multi-objective control of elevator group, Proceedings of