You are on page 1of 11

Introduction to Restricted Boltzmann Machines

Emmanuel Anguiano-Hernndez a
LabTL // CCC // INAOE

January 19, 2013

Emmanuel Anguiano-Hernndez a

Introduction to Restricted Boltzmann Machines

Content

1 2 3 4 5

Boltzmann Distribution The Boltzmann Machine Restricted Boltzmann Machine Constrastive Divergenge RBM with temporal Restrictions

Emmanuel Anguiano-Hernndez a

Introduction to Restricted Boltzmann Machines

Boltzmann Distribution
Probability for the distribution of the states of a system BD for the fractional number of particles states i possessing energy Ei is: Ni gi e (kB T ) = N Z (T )
kB is the Boltzmann Constant T is temperature gi is degeneracy, number of levels with energy Ei N particles number and Z (T ) is the partition function
Ei Ei

Ni N

occupying a set of

Z (T ) =
i

gi e (kB T )

Emmanuel Anguiano-Hernndez a

Introduction to Restricted Boltzmann Machines

Boltzmann Machines
Stochastic Recurrent Neural Network Binary units Energy value dened for the network Can learn internal representations and solve dicult combinatoric problems (given enought time) Not practical if connectivity is unrestricted

Emmanuel Anguiano-Hernndez a

Introduction to Restricted Boltzmann Machines

Boltzmann Machines
Global energy for a BM is: E =
i<j

wij si sj
i

i si

wi j is the strenght of the connection between unit i and unit j si correspond to the state of unit i, s {0, 1} is the threshold of unit i

Dierence global energy from a unit being 0 to being 1 is: E =


j

wij sj + i = Ei=o Ei=on

Probability of unit i of being on is: pi=on = 1 1 + exp(


E T )

At thermal equilibrium, the probability of any global state s when the network is free-running is given by the Boltzmann Distribution
Emmanuel Anguiano-Hernndez a Introduction to Restricted Boltzmann Machines

Restricted Boltzmann Machines

Biparthite graph, no intralayer connections Visible binary units act like inputs, hidden units can capture latent factors Can be analyzed with factor analysis

Emmanuel Anguiano-Hernndez a

Introduction to Restricted Boltzmann Machines

Restricted Boltzmann Machines


Global energy for a given conguration (v, h) is: E (v , h) =
i

ai vi
j

bj hj
i j

hj wij vi

E (v , h) = aT v bT h hT Wv
W consist of a matriz of weights W = (wij ) h,v are hidden and visible unit states a,b are bias weights for visible and hidden units

Probabilty over hidden and visible vectors are dened in function of energy: 1 P(vh ) = e E (v ,h) Z And the marginal probability for a given visible conguration is a function of all hidden congurations: P(v ) =
Emmanuel Anguiano-Hernndez a

1 Z

e E (v ,h)
h
Introduction to Restricted Boltzmann Machines

Restricted Boltzmann Machines


Due to partition, activation of any layer is independent given the other layer.
m

P(v |h) =
i=1 n

P(vi |h)

P(h|v ) =
j=1

P(hj |v )

Activation of individual units are:


n

P(hj = 1|v ) = (bj +


i=1 m

wij vi )

P(vi = 1|h) = (ai +


j=1

wij hj )

is the sigmoid function.


Emmanuel Anguiano-Hernndez a Introduction to Restricted Boltzmann Machines

Contrastive Divergence

State Activation:
To update the state of some units given others, lets assume we know the weights Compute activation energy of unit i ai =
j

wij xj
1 1+e x

The turn on unit i with probability pi = (ai ) and (ai ) = And turn o with probability 1 pi
Positive connections implies units to be in same states Negative connections implies dierent states

Emmanuel Anguiano-Hernndez a

Introduction to Restricted Boltzmann Machines

Contrastive Divergence
Learning weights:
for each example: Put example in visible units Update states of hidden units using logistic activation, then for each edge eij : Pos(eij ) = xi \ xj Reconstruct visible units, for each compute ai and update states, then update hidden units again and compute: Neg (eij ) = xi \ xj Update the weight of each edge eij : wij = wij + L \ (Pos(eij ) Neg (eij )) Repeat over all examples

Emmanuel Anguiano-Hernndez a

Introduction to Restricted Boltzmann Machines

RBM with Temporal Restrictions

Some units could be temporal Weight for this temporal units can contain a decay function over time or number of examples Experiments about how to integrate temporal decaying weights must be done Contrastive divergence should be extended in order to accept temporal units
Emmanuel Anguiano-Hernndez a Introduction to Restricted Boltzmann Machines