You are on page 1of 17

30

AdaGrad+RDA
echizen_tm
Oct.11, 2014


(3p)
Stochastic Gradient Descent(2p)
AdaGrad+RDA(6p)
AdaGrad+RDA(3p)
(1p)

(1/3)



{, }
{, , , , }
{10, 20, 30, 40, }

(2/3)


:x
:w

y = xi wi
i

y>0Ay<=0B
3

(3/3)

x = {:1, :1, :1, :1}
w = {:1, :1, :1, :-1}
y = 1*1 + 1*1 + 1*1 + 1*-1
=2>0


(t=1)(t=-1)

Stochastic Gradient
Descent(1/2)
wxt
w


f (w, x, t) =max(0,1 t xi wi )
i


1
: f (w,
x, t) = (t xi wi )2
2
i

Stochastic Gradient
Descent(2/2)
w
Stochastic Gradient Descent(SGD)

Stochastic Gradient Descent


=

w = 0;
for ((x,t) in X) {
w -= f(w, x, t);
}

: f (w,
x, t) / wi = txi


:f (w, x, t) / wi = (t xi wi )xi
i

AdaGrad+RDA(1/6)
SGDAROWSCW

AdaGrad+RDA

AROWSCW

SGDAdaGrad+RDA
SGD:
sxs+1w
AdaGrad+RDA:
0ss+1

AdaGrad+RDA(2/6)
AdaGrad+RDARegret

1
2
R(ws+1 ) = ( gi ws+1,i ) + ws+1 + ( hi ws+1,i
)
2 i
i

1 s
gi = f (w j , x j , t j ) / w j,i
s j=0
1
hj =
s

2

{f (w j , x j , t j ) / w j,i }
s

i=0

AdaGrad+RDA(3/6)

1
2
R(ws+1 ) = ( gi ws+1,i ) + ws+1 + ( hi ws+1,i
)
2 i
i
Regretw, g, h, 4
w: s+1
g,h: (f)
ghRegret
:

AdaGrad+RDA(4/6)

1
gi = f (w j , x j , t j ) / w j,i
s j=0
s

1
hj =
s

2

{f (w j , x j , t j ) / w j,i }
s

i=0

gh


f (w j , x j , t j ) / w j,i
g

AdaGrad+RDA(5/6)

R(w)=0w=r(,g,h)
=
w = 0;
for ((x,t) in X) {
g(w,x,t);
h(w,x,t);
w = r(, g, h);
}

AdaGrad+RDA(6/6)
R(w)=0w=r(,g,h)

gi

wi = 0
gi >

wi = (gi + ) / h i
gi <

wi = (gi ) / h i

AdaGrad+RDA(1/3)

AdaGrad+RDA

AdaGrad = Adaptive Gradient



=
AROWSCW

RDA = Regularized Dual Averaging


Regularized: ()
Dual Averaging: ()

AdaGrad+RDA(2/3)
Regret

1
2
R(ws+1 ) = ( gi ws+1,i ) + ws+1 + ( hi ws+1,i
)
2 i
i

loss function:

()

regularization
term:

Dual
Averaging

Regularized

proximal
term

Adaptive
Gradient

AdaGrad+RDA(3/3)
1
w
s

max f j , w j ws+1
ws+1

j=0
s

= max f j , w j f j , ws+1
ws+1

j=0

j=0

= min f j , ws+1 = min


ws+1

ws+1

j=0

f
j=0

= min g, ws+1 = min gi ws+1,i


ws+1

ws+1

/ s, ws+1

(1/1)


SGD
AdaGrad+RDA
AdaGrad+RDA

(https://github.com/echizentm/AdaGrad)

:
Duchi et.al.(2010) Adaptive Subgradient Methods for Online
Learning and Stochastic Optimization
Xiao(2010) Dual Averaging Methods for Regularized
Stochastic Learning and Online Optimization

You might also like