Deep Learning Applied To Games

42 5 Vol. 42, No.
5
2016 5 ACTA AUTOMATICA SINICA May, 2016

1 2 1, 2
() .
, . ,
, . .
, , , ,
, , . . , 2016, 42(5): 676684
DOI 10.16383/j.aas.2016.y000002
Deep Learning Applied to Games

GUO Xiao-Xiao1 LI Cheng2 MEI Qiao-Zhu1, 2
Abstract In this article, we present a survey of recent deep learning techniques and their applications to games. Deep
learning aims to learn an end-to-end, non-linear mapping from the input to the output through multi-layer neural networks.
Such architecture has several significant advantages as compared to traditional machine learning models. There has been a
flurry of recent work on combining deep learning and reinforcement learning to better evaluate and optimize game policies,
which has led to significant improvements of artificial intelligence in multiple games. We systematically review the use of
deep learning in well-known games.
Key words Deep learning, games, deep reinforcement learning, Go, artificial intelligence
Citation Guo Xiao-Xiao, Li Cheng, Mei Qiao-Zhu. Deep learning applied to games. Acta Automatica Sinica, 2016,
42(5): 676684
2016 . Alphabet [14] , ,

( Google) DeepMind ,
AlphaGo 15 (
. IBM ). [5] .
(Deep Blue) 2006 , Hinton
, (Deep belief networks, DBN).
. , AlphaGo ,
.
. (Restricted Boltzmann machine)[6] .

1 (Deep Learning) [78] . ,
[9]
. [10] ,
(Neural networks) . [10] (Activation
(Multilayer percep- function)[11] .
tron, MLP) (Convolutional neural ( (Graphics processing unit,
network, CNN) (Recurrent neural GPU) (High-performance comput-
network, RNN) . 80 ing, HPC) ),
2016-04-22 2016-05-10
,
Manuscript received April 22, 2016; accepted May 10, 2016 .

Recommended by Associate Editor ZHOU Zhi-Hua (Image classification)[12] (Object detec-
1. MI 48109 tion)[1314] (Video classification)[15]
2. MI 48109
1. Department of Electrical Engineering and Computer Sci- (Sense parsing)[16] (Shadow de-
ence, University of Michigan, Ann Arbor, MI 48109, USA 2. tection)[17] , (Speech recog-
School of Information, University of Michigan, Ann Arbor, MI nition)[18] (Prosody contour predic-
48109, USA
5 : 677
tion)[19] (Prosody contour predic- . ,

tion)[19] (Text-to-speech syn- .
thesis)[2021] , (Pars- ,
ing)[22] (Machine translation)[2324] , (Invariant).
(Contextual entity linking)[25] , ,
(Sentiment analysis)[26] ,
(Information retrieval)[2728] . ,
[5, 2930] .
, , [32] ; ,
() (Word representation),
. ,
? ,
: . ,
. ,

1.1 (End-to-end Learning)
, ()
[3336] .
. .
(Error propagation), :
. , .
, ,
(Tokenization) (POS tagging) . ,
(Parsing) (Semantic analysis) ,
. , ( Kernel),
, . [36] .
,
1.3 (Scalability)
()
.
(), (Feature extraction) , (Over-
. fitting) .
, (Trainable hier- ,
archical representation) , . ( SVM
(Support vector machine) Logistic regression)
, (Kernel method)
[31] . , , : O(N 2 ), N
(, . ,
.
). (Stochastic gradient descent),
, ,
. , . ,
1 (Backpropagation)
. ,
() ( [37] . ,
) . (Diminishing return)
. ,
1.2 (Non-linear Learning)
.
,
. [5, 38] .

, , ,
678 42
. (Tabular representation) (Linear

function approximation)[39] .
2 ,
, (Game playing), .
. ,
, (
. ). ,
, () , (Partially observ-
, . able Markov decision problems, POMDP),
,
(Perfect information games, (
) (). ), .
, ,
( Tic-Tac-Toe) , POMDP
. , [4143] . ,
, .
(Reinforcement learning) . [30].
.
, [3940] . 2.2
.
AlphaGo ( ,
CrazyStone Zen) , .
(Monte-Carlo tree search, MCTS) .
, , .
: (Exploration vs. ex- ,
ploitation) (Temporal credit : ? ?
assignment). ?
: 1) (Policy); 2)
. , . ,
( ,
1 ). (Value)
(),
2.1 (Deep Reinforcement (
Learning) ),
.
1
Fig. 1 A convolutional neural network learns a mapping from game screens to game policy
5 : 679
3 ; 2)
, BASS ; 3)
Arcade (Arcade learning environ-
,
ment, ALE)
,
, ImageNet
.
[44] . ALE 2600 (ATARI 2600)
DISCO ; 4)
50 ()[44] .
(Locality-sensitive Hashing),
160 210
LSH [44] .
, 128 .
Contingency awareness ,
, 18 .
,
,
[46] . Belle-
,
mare tug-of-war sketch
:
[47] . ,
,

(Reward)
. ,
.
.
(Value
,
function) .
Q(s, a; ),
+ .
. , 2013 ,
Q-learning [45]
, IBM
.
Watson Research Center Tesauro
(Backgam-
Q(st , at ) = Q(st , at ) + (rt + mon)[48] .
max Q(st+1 , b) Q(st , at )) ,
b
.
st , at , rt t 1) .
, st+1 t + 1 . +
. , . ,
, ,
, , .
. + ,
(Generalization):
, .
. .
. ( , .
) , :
, (Experience replay) (Target
. , , Q-separation). ,
,
[49] .
Q(s, a; ) = T (s, a)
(s, a, r, s0 ) , D.
, () + Q
.
,
. L() = E(s, a, r, s0 )U(D) [(r+
Bellemare ALE max Q(s0 , b; ) Q(s, a; ))2 ]
b
SARSA
: 1) , U(D) D
, .
(BASIC) , (i.i.d) .
680 42
, Bellemare
(s, a, r, s0 ), . (Consistent Bellman
, . operator), .
,
. .
, . ,
(r+ maxb Q(st+1 , b; )) Q(s, a; ) Q
, . . Consistent Bellman operator
DQN [56] .
. Q- (Deep Q-network, DQN) DeepMind Wang
(Dueling network).
L() = E(s, a, r, s0 )U(D) [(r+ ,
.
max Q(s0 , b; ) Q(s, a; ))2 ]
b
[57] .
(Target) ,
3) .
[50] .

Q-
(Partially observability), Deep
, Q-learning
recurrent Q network (DRQN). DRQN
(CNNs), 2600
DQN , CNN
. Q-
(Long short-term memory, LSTM) .
, DeepMind Schaul
, DRQN
Q-.
[41] .
4) . DQN
(Temporal difference error) [51] .
GPU 8

. DeepMind Mnih
. Guo
(Asynchronous advan-
tage actor-critic, A3C) [58] .

, [52]
16 CPU, 4 .
[53] . Schul-
5) . DQN
man (Trust region policy
, .
optimization) [54] .

2) . DQN ,
,
,
[5960] .
, ,
,
. Double DQN
.
,

DQN [55]
.
L() = E(s, a, r, s0 )U(D) [(r+
Q(s0 , arg max Q(s0 , b; ); ) Q(s, a; ))2 ] 4
b
DQN , Double DQN .

, (),
[55] . Double DQN , .
. ,
.
, ,
. Q (s, a) . ,
: s , .
(s), a. ,
5 : 681
, .
. (),
, .
, (
. ), .
.
Clark . AlphaGo
, [64] .
().
5
.
41.1 % 44.4 % . ,
, , .
, Bowling
. 50 %, .
GnuGo ( Regret
Fuego)[61] . [65] . , Yakovenko
Maddison DeepMind ,
(KGS) [66] .
12 , DeepMind Heinrich Silver
55 % . ,
GnuGo 97 % , (Neural fictitious self-play,
200 [62] . NFSP). NFSP
Tian Facebook (Fictitious play) ,
(DarkForest) [63] . , (Zero-sum games)
, (Potential games)
. ( [67] .
) () .
6
, AlphaGo
(Self-play) , Atari Atari
[64] . . ,
.
, () .

(Ground-truth). AlphaGo . ,
, .
. , Chinook
AlphaGo . , Al- ,
phaGo [64] . alpha-beta
Atari [68] .
. ,
, (Minecraft) ,
, DeepMind
. , , (StarCraft). ,
. , , ,
. , .
()
7
,
. AlphaGo .
.
682 42
. 13 Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-

Cun Y. Overfeat: Integrated recognition, localization and
detection using convolutional networks. In: Proceedings of
, the 2013 International Conference on Learning Representa-
. , tions. Scottsdale, Arizona: ICLR, 2013.
. 14 Szegedy C, Toshev A, Erhan D. Deep neural networks for ob-
ject detection. In: Proceedings of the 2013 Advances in Neu-
, ral Information Processing Systems. Lake Tahoe, Nevada:
, . NIPS, 2013.
, 15 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R,
Li F F. Large-scale video classification with convolutional
neural networks. In: Proceedings of the 2014 IEEE Confer-
ence on Computer Vision and Pattern Recognition. Colum-
. bus, OH, USA: IEEE, 2014.
(), 16 Farabet C, Couprie C, Najman L, LeCun Y. Learning hi-
. erarchical features for scene labeling. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 2013, 35(8):
References 19151929
17 Khan S H, Bennamoun M, Sohel F, Togneri R. Automatic
1 Werbos P. Beyond Regression: New Tools for Prediction
feature learning for robust shadow detection. In: Proceed-
and Analysis in the Behavioral Sciences [Ph. D. dissertation],
ings of the 2014 IEEE Conference on Computer Vision and
Harvard University, USA, 1974.
Pattern Recognition (CVPR). Columbus, OH, USA: IEEE,
2 Parker D B. Learning Logic, Technical Report TR-47, MIT 2014.
Press, Cambridge, 1985.
18 Amodei D, Anubhai R, Battenberg E, Case C, Casper J,
3 LeCun Y. Une proc edure d0 apprentissage pour R
eseau Catanzaro B, Chen J D, Chrzanowski M, Coates A, Diamos
a seuil assym
` etrique (a learning scheme for asymmetric G, Elsen E, Engel J, Fan L X, Fougner C, Han T, Hannun
threshold networks). In: Proceddings of the Cognitiva 85. A, Jun B, LeGresley P, Lin L, Narang S, Ng A, Ozair S,
Paris, France. 599604 (in French) Prenger R, Raiman J, Satheesh S, Seetapun D, Sengupta S,
Wang Y, Wang Z Q, Wang C, Xiao B, Yogatama D, Zhan
4 Rumelhart D E, Hinton G E, Williams R J. Learning J, Zhu Z Y. Deep speech 2: End-to-end speech recognition
representations by back-propagating errors. Nature, 1986, in English and Mandarin. preprint arXiv:1512.02595, 2015.
323(6088): 533536
19 Fernandez R, Rendel A, Ramabhadran B, Hoory R.
5 Bengio Y. Learning Deep Architectures for AI. Hanover, Prosody contour prediction with long short-term memory,
MA: Now Publishers Inc, 2009. bi-directional, deep recurrent neural networks. In: Proceed-
ings of the 15th Annual Conference of International Speech
6 Hinton G E, Osindero S, Teh Y W. A fast learning algo-
Communication Association. Singapore: Curran Associates,
rithm for deep belief nets. Neural Computation, 2006, 18(7):
Inc., 2014.
15271554
20 Fan Y C, Qian Q, Xie F L, Soong F K. TTS synthesis with
7 Ranzato M, Poultney C, Chopra S, LeCun Y. Efficient learn- bidirectional LSTM based recurrent neural networks. In:
ing of sparse representations with an energy-based model. Proceedings of the 15th Annual Conference of International
In: Proceedings of the 2007 Advances in Neural Information Speech Communication Association. Singapore: Curran As-
Processing Systems. Cambridge, MA: MIT Press, 2007. sociates, Inc., 2014.
8 Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy 21 Sak H, Vinyals O, Heigold G, Senior A, McDermott E,
layer-wise training of deep networks. In: Proceedings of the Monga R, Mao M. Sequence discriminative distributed
2007 Advances in Neural Information Processing Systems. training of long short-term memory recurrent neural net-
Cambridge, MA: MIT Press, 2007. works. In: Proceedings of the 15th Annual Conference of
9 Erhan D, Manzagol P A, Bengio Y, Bengio S, Vincent P. the International Speech Communication Association. Sin-
The difficulty of training deep architectures and the effect of gapore: Curran Associates, Inc., 2014.
unsupervised pre-training. In: Proceedings of the 12th Inter- 22 Socher R, Bauer J, Manning C D, Ng A Y. Parsing with
national Conference on Artificial Intelligence and Statistics. compositional vector grammars. In: Proceedings of the 51st
Clearwater, Florida, USA: AISTATS, 2009. 153160 Annual Meeting of the Association for Computational Lin-
10 Glorot X, Bengio Y. Understanding the difficulty of train- guistics. Sofia, Bulgaria: ACL, 2013.
ing deep feedforward neural networks. In: Proceedings of 23 Sutskever I, Vinyals O, Le Q V. Sequence to sequence learn-
the 13th International Conference on Artificial Intelligence ing with neural networks. In: Proceedings of the 2014 Ad-
and Statistics. Sardinia, Italy: ICAIS, 2010. vances in Neural Information Processing Systems. Montreal,
Canada: MIT Press, 2014.
11 Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural
networks. In: Proceedings of the 14th International Con- 24 Gao J F, He X D, Yih W T, Deng L. Learning continuous
ference on Artificial Intelligence and Statistics. Fort Laud- phrase representations for translation modeling. In: Pro-
erdale, United States: ICAIS, 2011. ceedings of the 52nd Annual Meeting of the Association for
Computational Linguistics. Baltimore: ACL, 2014.
12 Simonyan K, Zisserman A. Very deep convolutional net-
works for large-scale image recognition. In: Proceedings of 25 Gao J F, Deng L, Gamon M, He X D, Pantel P. Model-
the 2014 International Conference on Learning Representa- ing Interestingness with Deep Neural Networks, US Patent
tions. Rimrock Resort Hotel, Banff, Canada: ICRR, 2014. 20150363688, December 17, 2015.
5 : 683
26 Socher R, Perelygin A, Wu J Y, Chuang J, Manning C D, 42 Bakker B, Zhumatiy V, Gruener G, Schmidhuber J. A robot

Ng A Y, Potts C. Recursive deep models for semantic com- that reinforcement-learns to identify and memorize impor-
positionality over a sentiment treebank. In: Proceedings of tant previous observations. In: Proceedings of the 2013
the 2013 Conference on Empirical Methods in Natural Lan- IEEE/RSJ International Conference on Intelligent Robots
guage Processing (EMNLP). Seattle, Washington: EMNLP, and Systems. Manno-Lugano, Switzerland: IEEE, 2003
2013.
43 Wierstra D, F orster A, Peters J, Schmidhuber J. Recur-
27 Shen Y L, He X D, Gao J F, Deng L, Mesnil G. A latent rent policy gradients. Logic Journal of IGPL, 2010, 18(5):
semantic model with convolutional-pooling structure for in- 620634
formation retrieval. In: Proceedings of the 23rd ACM Inter- 44 Bellemare M, Naddaf Y, Veness J, Bowling M. The arcade
national Conference on Information and Knowledge Man- learning environment: an evaluation platform for general
agement. New York, NY, USA: ACM, 2014. agents. Journal of Artificial Intelligence Research, 2013, 47:
28 Huang P S, He X D, Gao J F, Deng L, Acero A, Heck L. 253279
Learning deep structured semantic models for web search 45 Watkins C J H, Dayan P. Technical note: Q-learning. Ma-
using clickthrough data. In: Proceedings of the 22nd ACM chine Learning, 1992, 8(34): 279292
International Conference on Information & Knowledge Man-
agement. New York, NY, USA: ACM, 2013. 46 Bellemare M G, Veness J, Bowling M. Investigating con-
tingency awareness using Atari 2600 games. In: Proceed-
29 Bengio Y, Courville A, Vincent P. Representation learning: ings of the 26th AAAI Conference on Artificial Intelligence.
a review and new perspectives. IEEE Transactions on Pat- Toronto, Ontario: AIAA, 2012.
tern Analysis and Machine Intelligence, 2013. 17981828,
47 Bellemare M G, Veness J, Bowling M. Sketch-based linear
DOI: 10.1109/TPAMI.2013.50
value function approximation. In: Proceedings of the 26th
30 Schmidhuber J. Deep learning in neural networks: an Advances in Neural Information Processing Systems. Lake
overview. Neural Networks, 2015, 61: 85117 Tahoe, Nevada, USA: NIPS, 2012.
31 LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 48 Tesauro G. TD-Gammon, a self-teaching backgammon pro-
521(7553): 436444 gram, achieves master-level play. Neural Computation,
1994, 6(2): 215219
32 Lee H, Grosse R, Ranganath R, Ng A Y. Convolutional deep
49 Riedmiller M. Neural fitted Q iterationfirst experiences
belief networks for scalable unsupervised learning of hierar-
with a data efficient neural reinforcement learning method.
chical representations. In: Proceedings of the 26th Annual
In: Proceedings of the 16th European Conference on Ma-
International Conference on Machine Learning. New York,
chine Learning. Porto, Portugal: Springer, 2005.
NY, USA: ACM, 2009.
50 Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J,
33 Yao A C C. Separating the polynomial-time hierarchy by Bellemare M G, Graves A, Riedmiller M, Fidjeland A K,
oracles. In: Proceedings of the 26th Annual Symposium Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou
on Foundations of Computer Science. Portland, OR, USA: I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D.
IEEE, 1985. 110 Human-level control through deep reinforcement learning.
34 Hastad J. Almost optimal lower bounds for small depth cir- Nature, 2015, 518(7540): 529533
cuits. In: Proceedings of the 18th Annual ACM Symposium 51 Schaul T, Quan J, Antonoglou I, Silver D. Prioritized ex-
on Theory of Computing. New York, NY, USA: ACM, 1986. perience replay. In: Proceedings of the 2016 International
Conference on Learning Representations. Caribe Hilton, San
35 Braverman M. Poly-logarithmic independence fools
Juan, Puerto Rico: ICLR, 2016.
bounded-depth Boolean circuits. Communications of the
ACM, 2011, 54(4): 108115 52 Ross S, Gordon G J, Bagnell J A. A reduction of imitation
learning and structured prediction to no-regret online learn-
36 Bengio Y, Delalleau O. On the expressive power of deep ing. In: Proceedings of the 14th International Conference
architectures. Algorithmic Learning Theory. Berlin Heidel- on Artificial Intelligence and Statistics. Ft. Lauderdale, FL,
berg: Springer, 2011. 1836 USA: AISTATS 2011.
37 Le Cun Y, Boser B, Denker J S, Henderson D, Howard R 53 Guo X X, Singh S, Lee H, Lewis R, Wang X S. Deep learn-
E, Hubbard W, Jackel L D. Handwritten digit recognition ing for real-time ATARI game play using offline Monte-Carlo
with a back-propagation network. In: Proceedings of the tree search planning. In: Proceedings of the 2014 Advances
1990 Advances in Neural Information Processing Systems. in Neural Information Processing Systems. Cambridge: The
San Francisco: Morgan Kaufmann, 1990. MIT Press, 2014.
38 Bengio Y, LeCun Y, DeCoste D, Weston J. Scaling learning 54 Schulman J, Levine S, Moritz P, Jordan M, Abbeel P. Trust
algorithms towards AI. Large-Scale Kernel Machines. Cam- region policy optimization. In: Proceedings of the 32nd In-
bridge: MIT Press, 2007. ternational Conference on Machine Learning. Lille, France:
ICML, 2015.
39 Sutton R S, Barto A G. Reinforcement Learning: An Intro-
duction. Cambridge: MIT Press, 1998. 55 van Hasselt H, Guez A, Silver D. Deep reinforcement learn-
ing with double Q-learning. In: Proceedings of the 30th
40 Kaelbling L P, Littman M L, Moore A W. Reinforcement AAAI Conference on Artificial Intelligence. Phoenix, Ari-
learning: A survey. Journal of Artificial Intelligence Re- zona USA: AIAA, 2016.
search, 1996, 4: 237285
56 Bellemare M G, Ostrovski G, Guez A, Thomas P S, Munos
41 Hausknecht M, Stone P. Deep recurrent q-learning for par- R. Increasing the action gap: new operators for reinforce-
tially observable MDPS. In: Proceedings of the 2015 AAAI ment learning. In: Proceedings of the 30th AAAI Conference
Fall Symposium Series. The Westin Arlington Gateway, Ar- on Artificial Intelligence. Phoenix, Arizona USA: AIAA,
lington, Virginia: AIAA, 2015. 2016.
684 42
57 Wang Z Y, Schaul T, Hessel M, van Hasselt H, Lanctot 67 Heinrich J, Lanctot M, Silver D. Fictitious Self-Play in
M, de Freitas N. Dueling network architectures for deep Extensive-Form Games. In: Proceedings of the 32nd In-
reinforcement learning. In: Proceedings of the 32nd In- ternational Conference on Machine Learning. Lille, France:
ternational Conference on Machine Learning. Lille, France: ICML, 2015.
ICML, 2016.
58 Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T P, 68 Schaeffer J, Lake R, Lu P, Bryant M. CHINOOK the world
Harley T, Silver D, Kavukcuoglu K. Asynchronous methods man-machine checkers champion. AI Magazine, 1996, 17(1):
for deep reinforcement learning. preprint arXiv:1602.01783, 2129
2016.
59 Rusu A A, Colmenarejo S G, Gulcehre C, Desjardins G,
Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Had-
sell R. Policy distillation. In: Proceedings of the 2016 In-
ternational Conference on Learning Representations. Caribe .
Hilton, San Juan, Puerto Rico: ICLR, 2016.
.
60 Parisotto E, Ba J L, Salakhutdinov R. Actor-mimic: Deep E-mail: guoxiao@umich.edu
multitask and transfer reinforcement learning. In: Proceed- (GUO Xiao-Xiao Ph. D. candidate
ings of the 2016 International Conference on Learning Rep-
in the Department of Electrical Engi-
resentations. Caribe Hilton, San Juan, Puerto Rico: ICLR,
2016. neering and Computer Science, Univer-
sity of Michigan. His research interest
61 Clark C, Storkey A. Training deep convolutional neural
networks to play go. In: Proceedings of the 32nd Inter- covers deep learning and deep reinforcement learning.)
national Conference on Machine Learning. Lille, France:
ICML, 2015.

62 Maddison C J, Huang A, Sutskever I, Silver D. Move eval-
uation in Go using deep convolutional neural networks. In:
.
Proceedings of the 2014 International Conference on Learn- . E-mail: lichengz@umich.edu
ing Representations. Rimrock Resort Hotel, Banff, Canada: (LI Cheng Ph. D. candidate at the
ICRR, 2014. School of Information, University of
63 Tian Y D, Zhu Y. Better computer go player with neu- Michigan. Her research interest covers
ral network and long-term prediction. In: Proceeding of the data mining and information retrieval.)
2016 International Conference on Learning Representations.
Caribe Hilton, San Juan, Puerto Rico: ICLR, 2016.
64 Silver D, Huang A, Maddison C J, Guez A, Sifre L, van
den Driessche G, Dieleman S, Schrittwieser J, Antonoglou .
I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, ,
Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach
. .
M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering the
game of Go with deep neural networks and tree search. Na- E-mail: qmei@umich.edu
ture, 2016, 529(7587): 484489 (MEI Qiao-Zhu Associate profes-
sor at the School of Information and
65 Bowling M, Burch N, Johanson M, Tammelin O. Heads-
up limit hold0 em poker is solved. Science, 2015, 347(6218): the Department of Electrical Engineer-
145149 ing and Computer Science (EECS), University of Michigan.
66 Yakovenko N, Cao L L, Raffel C, Fan J. Poker-CNN: a pat- His research interest covers large-scale data mining, infor-
tern learning strategy for making draws and bets in poker mation retrieval, and machine learning. Corresponding au-
games. Tucson, Arizona: AIAA, 2005. thor of this paper.)

Deep Learning Applied To Games

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning Applied To Games

Uploaded by

Copyright:

Available Formats

42 5 Vol. 42, No.

Deep Learning Applied to Games

2016 . Alphabet [14] , ,

tion)[19] (Prosody contour predic- . ,

. (Tabular representation) (Linear

tage actor-critic, A3C) [58] .

DQN , Double DQN .

. 13 Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-

26 Socher R, Perelygin A, Wu J Y, Chuang J, Manning C D, 42 Bakker B, Zhumatiy V, Gruener G, Schmidhuber J. A robot

You might also like