You are on page 1of 2

A simple introduction to the Boltzmann machine The cortex is the largest part of the brain of higher mammals.

A notion about its purpose is that it represents a complex environment. The cortex thus comprises a model of the environment as learned by experience. Such a model is necessary for a human to react appropriately to a complex situation. The Boltzmann machine is a mathematical model making it possible to capture this notion on a computer. In this neural network all neurons are connected with each other by weighted synaptic connections. Via these weights neurons communicate their activations to each other. The activation of a neuron happens stochastically taking into account the activations of other neurons to which there is a strong connection. Following this dynamics the activation states of the network comprise a probability distribution, the Boltzmann distribution from thermodynamics, which gives name to this type of neural network. As part of this distribution, the activations of the input units, however, match the observed stimuli (data points). The network has to represent the environment, and a task for it which is used to test this ability is the following: data which are shown to the input neurons during a ``clamped phase'' should later be generated freely by the net during a ``free running phase''. In the ``free running phase'' no data arrive at the input neurons, thus they have to become active spontaneously. The stochastic nature of Boltzmann machine units makes this possible. The network is optimized for this task by learning optimal weight values. The mathematically derived learning rule comprises both phases: in the ``clamped phase'' there is Hebbian learning which means that neurons which are active simultaneously strengthen their weights to each other. Thus, activation states which occur often are stabilized. In the ``free running phase'' there is anti-Hebbian learning which differs from Hebbian learning only by a negative sign. Now, activation states which occur often are destabilized. Learning is finished when the learning steps of the two phases cancel away, that is when the network generates the same data distribution as the environment. Generation of data in the ``free running phase'' happens through activation of the input neurons by the inner neurons which project back to them. Each of these inner neurons generates an ``elementary'' component of the datapoint. This process can be structured by ``sparse coding'': accordingly a datapoint is generated by a small number of ``elementary'' components that may be chosen from a larger pool of components. The necessity for structured information processing arises from the sheer complexity of the brain. Tasks like recognition and capturing of an object by a robot are complex and must be structured. ``Sparse coding'' explains a datapoint by a small number of causes. For many data these causes occur at the same time

statistically independently. After a few processing stages a picture which contains many pixels may be traced back to the presence of a few objects. If trained on natural grey-valued images and using ``sparse coding'' then artificial neurons develop properties which are known from neurons of the primary visual cortex: their connections to the input are limited to a small area and their shape is designed to capture edges of a certain orientation. I applied ``sparse coding'' to the Boltzmann machine [1999, Sparsely Coding Boltzmann Machine] using neurons which favor the inactive state over an active state. Thus only a small number of neurons takes part in the generation of a datapoint. Furthermore, inner neurons are not connected with each other. So, in the ``clamped phase'' all inner neurons get the same inputs from the shown data. If one would using this phase alone in the learning rule, then all inner neurons would develop the same connectivity to the input. As the most important ``elementary'' component of the data these would represent the principle component of the data. This is the direction in data space where there happens to be the largest variance. A division of labor among the inner neurons emerges through the ``free running phase''. Now the inner neurons can interact via their connections to the input neurons, which now react to the back-projection from the inner neurons. Activation states which occur often are trained away. Now an inner neuron tries to generate possible data components which differ from components which are already learned. Hereby, variety emerges as well as the tendency to forget old memories and try something new. From the example of a Boltzmann machine we see that artificial neural nets are not only models of information processing in the brain, but also come up with attractive models for biological learning mechanisms. Vice versa, simple learning rules can be an access to the complex structures and functionality of the brain.

You might also like