Professional Documents
Culture Documents
Adam
echizen_tm
Apr.29, 2015
Adam
Adam(1p)
SGD(1p)
Adam(2p)
(3p)
Adam(1p)
Adam
state of the art
AdaGrad+RMSProp()
(AdaGrad)
SGD
gt = ft ( t1 )
t = t1 gt
f ()
g
Adam
gt = ft ( t1 )
2
t = t1 E[g] / E[g ]
f ()
g
Adam
2
t = t1 E[g] / E[g ]
abs()
()
1
E[g] / E[g 2 ]
E[g] / E[g 2 ]
m=0
(100)
m_1 = (0 + 100) / 2 = 50
m_2 = (0 + 100 + 100) / 3 = 66.6
m_3 = (0 + 100 + 100 + 100) / 4 = 75
(0)
(Exponential Moving Average)
mt = mt1 (1 )gt
t
mt = (1 ) gi
ti
i=1
t
"
%
ti
E $mt = (1 ) gi '
#
&
i=1
t
E[gt ](1 )
ti
i=1
t
t1
= E[gt ]( ti ti )
i=1
i=0
t
= E[gt ](1 )
Adam
gt = ft ( t1 )
mt = 1mt1 + (1 1 )gt
2
t
vt = 2 vt1 + (1 2 )g
t
mt = mt / (1 1 )
vt = vt / (1 2t )
t = t1 m t / vt
Adam
E[g]
/ E[g 2 ]