You are on page 1of 8

Vergence Control in a Binocular Vision System using

Weightless Neural Networks


Karin Satie Komati and Alberto Ferreira De Souza
Departamento de Informática – Universidade Federal do Espírito Santo
Av. Fernando Ferrari, S/N, 29060-970 – Vitória – ES – Brazil, Tel. +55 27 3335 2641, Fax +55 27 3335 2650
{karin, alberto}@inf.ufes.br

Abstract movements of the eyes, which allow us to inspect the


environment for creating this representation. So, we have
This paper presents a methodology that we have started our research effort modeling the oculomotor
developed and used to implement an artificial binocular system. We have chosen WNNs as building blocks for
vision system capable of emulating the vergence eye modeling because they are simple, but powerful, highly
movements. This methodology involves using Weightless flexible and have fast learning algorithms, which allow
Neural Networks (WNNs) as building blocks of artificial the study of large networks using standard computers
vision architectures. Using the proposed methodology, we currently available.
have designed several WNN based architectures in which In this paper we present a system that models the
images captured by virtual cameras are used for human vergence oculomotor subsystem [2]. Vergence
controlling the position of these cameras’ “fovea” (high refers to the disjunctive eye movements whereby the eyes
resolution region of the cameras’ image). Our best move symmetrically in opposite directions to ensure that,
architecture is able to control the foveae vergence for the object of interest, the image in each eye falls on
movements with average error of only 3.6 image pixels, the fovea (central part of the retina) [1].
which is equivalent to an angular error of approximately
0.615o.

1. Introduction
Our visual perception of the world is egocentric, i.e.,
centered in our perceived position in relation to the
environment. Normally, we do not even notice that. It
would be awful, however, if our perception were
oculocentric, i.e., centered in the position of our eyes in
relation to the environment. In such case, we would
perceive the world moving with the movements of our
eyes. Nevertheless, in an apparent paradox, the visual
brain areas that first analyze the attributes of the visual
brain input possess an accurate oculocentric neural Figure 1. Simple neural architecture
organization [1]. Still, when moving our eyes, we feel that
In our implementation, images captured by virtual
the exterior world is steady. This shows that the human
cameras are used for controlling the position of these
visual perception depends not only on the image on the
cameras’ “fovea”. The control signals for positioning the
retina, but also on the knowledge of the position of the
foveae are generated by WNNs, which receive as input
eyes, head, and body. Therefore, the oculomotor system
the images captured by the cameras. To implement this
influences the visual perception and information
vergence system, we have used current knowledge about
regarding its status is important for visual perception of
the neurophysiology of the biological visual system and a
the world [2].
software tool we have developed to emulate WNNs
This paper presents the first step of a large research
architectures. This tool, named MAE (Portuguese
effort whose main objective is to implement a system,
acronym for Event Associative Machine), allows
based on artificial Weightless Neural Networks (WNNs
simulation of complex WNN architectures with hundreds
[3, 4, 10]), capable of emulating the human biological
of thousands of neurons in a standard PC running Linux.
visual system capacity for creating a steady internal
An example architecture of vergence control system we
representation of the environment. We believe that the
have studied using MAE is shown in Figure 1, while
oculomotor system plays a significant role on building
Figure 2 shows a screen shot of MAE running the neural
this internal representation, since it is responsible for the
architecture of Figure 1.
In Figure 1, Image_left and Image_right are images identified throughout most parts of the brain devoted to
captured by two virtual cameras positioned in distinct visual processing [1].
places in the virtual space to emulate the physical
separation of the eyes. The symbol ƒ denotes two- 2.1. Visual Pathways
dimension filters, denotes two dimension neural layers The optical nerve connects to the lateral geniculate
and denotes outputs that allow the MAE user to nucleus (LGN) of the thalamus and this to the visual
observe the status of a neural layer. cortex. The visual cortex is usually divided into 5 separate
areas, V1-V5 [1]; the LNG connects to the primary visual
cortex, or V1 (also called striate cortex, Figure 3). V1
possesses cells that detect several basic attributes of the
visual stimuli. There are neurons that detect stimulus that
have a certain angular orientation, others are sensitive to
color, others to disparities between stimulus coming from
each eye, etc.
From V1, the visual neuronal impulses are directed to
V2. V2 projects to V3, V4, and V5, and each of these
areas sends information to any of 20 or more other areas
of the brain that process visual information. This general
arrangement can be subdivided into three parallel
pathways, which start in the parvocellular and
magnocellular cells of the retina (in V1, the parvocellular
pathway is divided into two, parvocellular interblob and
parvocellular blob, as can be seen in Figure 4). Although
Figure 2. MAE screen shot of a simple neural each pathway is somewhat distinct in function, there is
architecture intercommunication between them.
The magnocellular cortical pathway starts in the layer
We have implemented and tested several neural
4Cα of V1 (Figure 4). This projects to layer 4B, which
architectures like the one shown in Figure 1, using filters
then projects to the thick stripes of V2. Area V2 then
and trained neural layers to emulate neurophysiological
projects to V3, V5 (middle-temporal cortex or MT), and
properties observed in the human biological visual
to the medial superior temporal cortex (MST). This
system. Our best performing architecture produces
pathway is an extension of the magnocellular pathway
vergence movements with average error of only 3.6 image
from the retina and LGN, and processes visual detail
pixels, which is equivalent to an angular error of
leading to the perception of shape in area V3, and
approximately 0.615o.
movement or motion in areas V5 and MST. Retina’s
magnocellular ganglion neurons show a high sensitivity to
contrast, low spatial resolution, and high temporal
resolution or fast transient responses to visual stimuli.
These characteristics make the magnocellular branch of
the visual system especially suitable to quickly detect
novel or moving stimuli.
In the second visual pathway, neurons of the
parvocellular interblob areas of V1 project to the pale
stripes of V2 and these to the inferior temporal cortex
Figure 3. Eye, LGN, V1 (Figure 4). This pathway possesses several feature
detectors (simple, complex and hypercomplex cells) and
2. Biological Vision is responsible for complex form recognition.
Finally, in the third and mixed visual cortical pathway,
Vision starts with two external sensors, the eyes. The neurons in the blobs of V1 project to the thin stripes of
retina, situated at the back of the eyeball, is where the V2. The thin stripes of V2 then project to V4. Area V4
image of the outside world is captured. The neurons of the receives input from both the parvo- and magnocellular
retina transform light energy into electrical signals that pathways of the retina and LGN. This parvocellular
are transmitted to the brain via the optic nerve. The retina portion of V4 is particularly important for the perception
ganglion cells, which drive the optical nerve, can be of color and maintenance of color perception regardless of
separated into two categories: parvocellular (small cells) lighting (color constancy).
and magnocellular (large cells). These two categories of
cells are the starting point of parallel pathways that can be
Retina’s parvocellular ganglion neurons show a low fifth – vergence – is disconjugate: the eyes move in
sensitivity to contrast, high spatial resolution, and low different directions and sometimes by different amounts.
temporal resolution or sustained responses to visual Thanks to the vergence oculomotor subsystem, when
stimuli. These cellular characteristics make the we look to an object that is near or far from us, each eye
parvocellular visual pathways especially suitable for the moves differently to keep the image of the object aligned
analysis of detail in the visual world. precisely on each fovea. If the object is near, the eyes
These separated parallel pathways decode/extract the converge; if the object is far, the eyes diverge. The
elementary properties of the objects in the visual field, difference in retinal position of an object in one eye
such as color, form and orientation, producing a rather compared to its position in the other is referred to as
complex internal representation of what we see. However, retinal disparity, and the detection of this disparity is
this representation is not captured instantly: we examine important for vergence control.
our surroundings all the time, focusing our attention in
different parts of the field of view through the movement Table 1. A Functional Classification of Eye
our eyes. We believe that the mind creates this internal Movements
representation in a way similar to a painter working on his Movement Function
canvas, where the eyes serve as the painter’s brush. Movements that stabilize the eye when the head moves
Vestibulo-ocular for brief or rapid head rotation
Optokinetic for sustained or slow head rotation
Movements that keep the fovea on a visual target
Saccade Bring new objects of interest onto the fovea
Smooth Pursuit Holds the image of a moving target on the
fovea
Vergence Adjusts the eyes for different viewing
distances in depth

2.3. Disparity Sensitivity in the Visual System


Numerous studies made on monkeys showed that a
large population of visual cells is capable of encoding the
amplitude and sign of horizontal disparities. Horizontal
disparity detectors were found in cortical visual areas V1,
V2, V3, V5 and MST. According to their disparity tuning
functions, which describe how these cells respond to
stimulus with different disparities, these cells can be
classified into several groups [5]. Figure 5 shows six such
groups: tuned near (TN), tuned excitatory (TE), tuned far
(TF), near (NE), tuned inhibitory (TI), and far (FA). We
will only discuss in more details the TN, TE and TF cells
because we have created models of them and used these
Figure 4. Visual parallel pathways models in our vergence control system.

2.2. Oculomotor System


Several image-preprocessing operations are performed
in the retina, and two are of interest for this work. First,
the retina’s ganglion cells act like contrast discriminators,
being rather insensitive to uniform light. Second, the
amount of ganglion cells devoted to different areas of the
retina is larger at the fovea and decreases in the direction
of the borders following approximately a gaussian density
distribution. This is perhaps one of the main reasons for
the existence of ocular movements.
We have five types of eye movements that put the
foveae on a target and keep they there. They are listed in
Table 1 [2]. The first four are conjugate movements: both
eyes move the same amount in the same direction. The
Figure 5. Disparity detection cells [5]
possible outcomes occurs: (i) the neuron responds with
TE cells have a disparity tuning function that displays the output of the nearest stored pattern, or (ii) with a
binocular facilitation over a narrow range of disparities random value if there is more than a single stored pattern
around zero and binocular suppression for negative or at the same Hamming distance of the input.
positive disparities (see Figure 5). That is, they have a VGRAM neurons can be made sensitive to color or
positive peak at zero disparity, are symmetrical around shades of gray [10]. This is achieved using neuron inputs
zero disparity and have a narrow tuning width. TN and TF dedicated to a color or gray level. These inputs respond
cells have disparity tuning functions similar to TE cells with a binary 1 when receiving a value equal or near its
but peak at negative or positive disparities, respectively color or gray level, and with a binary 0 otherwise. The
(Figure 5). binary outcomes of these inputs are the VGRAM neurons
In addition to helping control vergence movements, input patterns.
disparity sensitive cells, together with cells that report the In this work, we use VGRAM neurons to implement
position of the eyes and others responsible for visual our WNN based architectures for controlling vergence.
memory, are possibly important for creating internal
representations of form and distance of objects in the 4. The MAE Tool
visual field [6].
The Event Associative Machine (MAE) is an open
3. Weightless Neural Networks source framework for modeling VGRAM WNNs we have
developed. MAE is similar to the Neural Representation
WNNs [3] are based on artificial neurons that have Modeller (NRM), developed by the Neural Systems
binary inputs and outputs and no modifying weight Engineering Group at Imperial College London and
between neurons. Each such neuron has a look-up table, commercialized by Novel Technical Solutions [11]. MAE
which can be implemented using Random Access differs from NRM on three main aspects: is open source,
Memory (RAM). A neuron’s binary input is the address runs on UNIX (currently, Solaris and Linux), and uses a
of an entry of its look-up table, and the look-up table textual language to describe WNNs.
entry value is used to compute the neuron’s output. MAE allows designing, training and analyzing the
Instead of adjusting weights, WNN training consists of behavior of modular WNNs whose basic modules are two
changing the contents of look-up tables’ entries, which dimension neural layers and two dimension filters. Neural
allows highly flexible and fast learning algorithms. The layers’ neurons can have several characteristics (a type,
flexibility comes from the WNN neuron capacity of input sensitivity, memory size, etc) and the user, using C
implementing any logical function of its inputs and the programming language, can freely design filters. The user
expedite learning from the mutual independence between specifies these modular WNNs using Neural
look-up table entries, which permits changing an entry Architectures Description Language (NADL). NADL
without interfering with others. source files are compiled into MAE applications, which
In the middle of the 60s, Igor Aleksander [7] suggested have a built in graphical user interface and a Control
universal logic circuits, which can be implemented with Scripts Language (CDL) interpreter. The user can train,
RAM, as neurons of WNN. After that, other weightless recall, and alter neural layers’ contents using the graphical
models were proposed but, until the late 80s, all of them interface or scripts in CDL. NADL and CDL allow the
only generalized at the network level (individual neurons creation of integer variables and the use of for, while and
did not have generalization properties). To overcome that, do-while C like loops, which enable the creation of
Aleksander [8] suggested the inclusion of a spreading elaborate WNNs and powerful scripts for analyzing them.
phase, responsible for generalizing information from the
training set through storing clusters of extra slightly 5. Methodology
changed input patterns having each training input as
centroid. Generalizing neurons are known as Generalizing 5.1. The Retina Model
RAMs (GRAMs).
To implement our vergence control system we have
GRAMs are often implemented as Virtual GRAMs
created a simple retina model to provide visual input
(VGRAMs) [9]. A VGRAM is a GRAM in which space is
preprocessing. An artificial retina that follows this model,
not allocated for patterns that are never found during the
as the biological counterpart, is sensitive to contrast and
training phase and generalization is achieved through
provides high resolution at its center (fovea) and low
calculation. During the training phase, a VGRAM neuron
resolution at its borders. This retina model was built using
stores each input pattern and the correspondent desired
a MAE filter named “gauss” filter (Figure 1), which
output into its look-up table. During the recall phase,
receives as input the image that enters the virtual cameras
input patterns are compared with each input of the stored
(Image_left and Image_right in Figure 1) and produces as
input-output pairs. If there is an exact mach, the neuron
output an image that maps the input image following a
responds with the stored output; otherwise, one of two
gaussian distribution centered in a point on the input
image. Our model produces a very distorted version of the the left and right retinas’ images produced by the gauss
input image, as shown in Figure 6, but possibly similar to filter. According to an integer displacement parameter,
the parvocellular output of the biological retina. We use the TE filter emulates a TE, TN or TF layer of cells. The
two retinas to allow binocular vision. operation performed by this filter is very simple: each
output pixel is the average of corresponding pixels of
each input image. If the integer parameter is zero, the
input images are aligned and the filter emulates TE cells.
If the displacement parameter is positive, the input images
are displaced a number of pixels – the left image to the
left and the right image to the right – equal to the
parameter value. This emulates the TN cells, since points
of objects near to the virtual cameras then the point to
where their foveae are aimed to tend to coincide if the
images are displaced this way. If the parameter is
Figure 6. Test image and the corresponding gauss negative, the input images are displaced a number of
filter output pixels equal to the parameter value but in the opposite
direction. This emulates the TF cells.
Our gauss filter was implemented using a 9x9 Figure 7 shows the outputs of the left and right virtual
difference of gaussians (DOG [12]) kernel. The retinas obtained from the input images shown in Figure 2
application of this kernel to a region of the input image (please note where to each fovea is pointing), while
results in an approximation of the DOG model of retinal Figure 8 shows examples of TE, TN and TF neuron
ganglion cells. However, the convolution of the kernel layers.
with the input image alone would not produce the result
shown in Figure 6, since this image processing technique 5.3. The Minus Filter
(convolution) is applied linearly over the image [13]. So,
instead of applying the kernel linearly, in order to emulate Another filter we have used was the minus filter. It
the correlation between ganglion cells and different areas receives two images and return one, where each pixel is
of the retina, we have applied the DOG kernel according calculated in following way: if the value of a pixel f in the
to a gaussian distribution centered at the point in the input first image is equal or larger than the value of the
image where to each virtual camera’s fovea is pointing. corresponding pixel in the second image s, the output is
equal to f – s, otherwise it is equal to zero.

5.4. The Virtual World


The virtual world we have used in our experiments
was developed using OpenGL [14]. The initial code was
downloaded from www.linux.org/apps/AppId4310.html,
and was then modified to produce an image stereo pair as
Figure 7. TE filter input images
the one shown the Image_right and Image_left windows
of Figure 2. The images are two-dimensions projections
of a 3D model of a robot arm. These images are
approximately equivalent to those that would be seem by
a human observer (7 cm eyes separation) one meter away
from a one meter high similar robot arm.

5.5. The WNN Vergence Control System


a) b) c)
Figure 8. TE filter output images. a) Output image Our WNN vergence control system receives as input
with displacement parameter equal zero. b) Output several filtered versions of the left and right input images
image with displacement parameter equal 2. c) Output and generates as output a value to be added to the current
image with displacement parameter equal –2. horizontal position of a single virtual camera’s fovea. In
our model, the right virtual camera is the dominant
5.2. The TE Filter camera and decides (actually, a human operator) where to
look, while the left virtual camera is controlled by the
We have also modeled TE, TN and TF disparity WNN vergence control system. The only “motor neuron
detection cells (see Subsection 2.3). They have been function” emulated is the horizontal eye movement. So,
developed as the MAE TE filter, which receives as input the controlled virtual camera’s fovea goes to the left or to
the right of its previous position depending on the control exactly as in the literature, but the rnd_mem_zrizro
system output signal. However, in order to generate the neuron outputs zero when all inputs are zero.
correct output, the WNN must be trained. The virtual cameras’ images, after being pre-processed
The WNN training was performed according to the by the retinas, can be filtered in five different ways,
following simple algorithm: producing five possible inputs for the vergence control
neural layers: i) TE, or the output of the TE filter with
1. The operator chooses a point and aims the
displacement equal to zero; ii) TE1, the output of the TE
dominant (right) camera’s fovea to it (using a
filter with displacement 2; iii) TE2, the output of the TE
mouse click).
filter with displacement -2; iv) right_minus_left, or the
2. The system automatically moves the controlled
output of the minus filter with the right retina minus the
(left) camera’s fovea the by the same amount,
left retina as inputs; and v) left_minus_right, which is the
possibly positioning it in an incorrect point due to
opposite of right_minus_left. The number of connections
a wrong vergence angle for the chosen point.
between each vergence control neuron and each of these
3. The operator informs the training system (using a
inputs can vary. In addition, the way these connections
mouse click) what should be the correct position
are distributed throughout the input can also vary. We
of the controlled camera’s fovea.
have implemented two types of connection distributions:
4. The system trains the WNN to output a pattern
gaussian (G) and random (R). In the gaussian connection
suitable for moving the controlled camera’s fovea
type, the neuron’s inputs are connected according to a
to the correct position.
gaussian distribution centered in the point of the input that
5. The system allows the WNN to move the
corresponds to the neuron position in the control layer –
controlled camera’s fovea to the correct position.
this type of connection tries to emulate a retinotopic
6. Go back to step 1.
mapping. In the random connection type, the neuron’s
We have implemented this algorithm using a CDL inputs are connected to the corresponding input randomly.
script (Section 4) and used it to train several WNN based Two types of control strategies were implemented:
vergence control system architectures. All networks were ResetY and Set2Layers. In ResetY strategy, a single
trained using 32 selected points. vergence control neuron layer is used. This layer is
In the recall mode, the system acquires images, divided in two halves, left and right. The number of active
calculates the filters’ outputs, the WNNs’ outputs, and neurons on the right minus the number of active neurons
then generates the control signals that move the controlled on the left is the amount added to the current horizontal
camera’s fovea. Note that the fovea movement modifies fovea position. In Set2Layers strategy, two neuron layers
the output of the controlled camera’s retina, closing the are used; one is trained to positive movements (to the
control loop. The recall mode procedure was implemented right) and the other to negative movements (to the left).
with a CDL script and repeated 50 times for each testing The number of active neurons on the right layer minus the
point. 24 unknown points (not used during training) were number of active neurons on the left layer is the amount
used for testing the performance of the system. added to the current horizontal fovea position. On both
strategies, during training the control neural layers are
5.6. Performance Parameters recalled “seeing” the image of the wrong vergence
position before training. Previous learned information or
We have used two performance parameters: average random values are output according to the VGRAM
error and the number of unstable points. The average neurons behavior (see Section 3). This output is then
error is the average number of pixels the fovea is away changed via turning off a percentage of the appropriate
from the correct point for 40 out of the 50 recall script neurons in order to produce the correct fovea movement.
iterations – we have excluded the first 10 iterations for the The control neural layers are then trained “seeing” the
system uses it to incrementally move the controlled fovea wrong image and outputting the corrective control action.
to the target point. With some testing points, an
architecture may not be able to stabilize vergence and
5.7. Implemented Architectures
may move the controlled fovea to the limit of the input
image. We count these points as unstable points. Table 2 summarizes the characteristics of eight
architectures we have implemented. The architecture 1 of
5.6. Architectures Characteristics Table 2 is, actually, the architecture depicted in Figure 1,
and its NADL specification is show in Figure 9.
Many architectures were implemented, but all of them The line 1 of Figure 9 (we have numbered the lines
can be described by three main characteristics: type of only to facilitate the description) is a global command that
neuron, size and type of the inputs to the control neural sets the neuron memory size of all neurons to 32. We
layers, and type of control strategy. have used this neuron memory size in all architectures
We have used two types of neuron: the default and because the number of training points is equal to 32.
rnd_mem_zrizro neurons. The default neuron responds
Lines 3 to 8 specify the right and left input images sizes Finally, line 40 of Figure 9 specifies that the system
(512x512 pixels), their type of pixel (grayscale: 8 shades should train the control neural layer using its output as
of gray), and their generator and controller C functions. target pattern.
These two images are filtered by our gauss filter, lines 13
and 14, producing the neural layers gauss_right and Table 2. Implemented Architectures
gauss_left, which are created by lines 10 and 11. Note that Architecture 1 2 3 4 5 6 7 8
the size of these neural layers is only 64x64 neurons. Neuron rnd_mem_zrizro X X X
Lines 16 and 17 create two output windows that allow the Type default X X X X X
human operator to monitor gauss_right and gauss_left. Control right_minus_left 32 G32 G 32 G32 G 32 G32 R
These outputs are connected to gauss_right and gauss_left Layer left_minus_right 32 G 32 G
Inputs
by lines 19 and 20. These windows are the two top right TE, TE1 e TE2 32 G 32 G16 G16 R
windows of Figure 2 (both foveae are pointing to the Control ResetY X X X X X X X
center of the image). Strategy Set2Layers X

6. Experimental Results
1. set NEURON_MEMORY_SIZE = 32;
2.
3. input image_right[512][512] with greyscale outputs
4.
5.
produced by input_generator()
controled by input_controler_right(); Table 3 and Table 4 summarize the experimental
6.
7.
input image_left[512][512] with greyscale outputs
produced by input_generator()
results obtained with the 8 architectures represented in
8. controled by input_controler_left(control_out); Table 2, according to the performance parameters
9.
10. neuronlayer gauss_right[64][64] with greyscale outputs; presented in Subsection 5.6. We have tested the
11.
12.
neuronlayer gauss_left[64][64] with greyscale outputs; architectures with the points used during training (Table
13. filter image_right with gauss_filter()producing gauss_right; 3) and with unknown points (Table 4). The best results in
14. filter image_left with gauss_filter() producing gauss_left;
15. Table 3 and Table 4 are shown in gray.
16.
17.
output gauss_right_out[64][64];
output gauss_left_out[64][64];
By comparing Table 3 and Table 4, one can conclude
18. that the architecture 8 presents the best results, showing
19. outputconnect gauss_right to gauss_right_out;
20. outputconnect gauss_left to gauss_left_out; no unstable points for known and unknown points, the
21.
22.
smaller average error for known points and the runner-up
23. neuronlayer control[64][64] of rnd_mem_zrizro neurons average error for unknown points. Our best performing
24. with b&w outputs;
25. neuronlayer right_minus_left[64][64] with greyscale outputs; architecture produces vergence movements with average
26.
27. output right_minus_left_out[64][64]; error of only 3.6 image pixels, which is equivalent to an
28.
29.
output control_out[64][64]; angular error of approximately 0.615o.
30. outputconnect right_minus_left to right_minus_left_out; The model of our best architecture is depicted in
31.
32.
outputconnect control[@][@] to control_out;
Figure 10. This architecture uses default neurons
33.
34
filter gauss_right, gauss_left with minus_filter()
producing right_minus_left;
randomly connected to four filters outputs:
35. right_minus_left, with 32 connections per neuron; and the
TE, TN and TF filters with 16 connections per neuron (a
36. connect right_minus_left to control
37. with 32 random inputs per neuron and
38.
39.
gaussian distribution with radius 3.0; total of 32+3x16, or 80 connections per neuron).
40. associate control with control; Architecture 8 has Set2Layers as motor control strategy.
Figure 9. NADL specification of Figure 1 architecture
Lines 1 to 20 of Figure 9 are common to all Table 3. Results for known points
Architecture 1 2 3 4 5 6 7 8
architectures we have implemented and describe an
Average Error 19,20 23,59 4,48 11,64 10,02 -3,84 3,730,26
architecture up to its 2 retinas. However, lines 23 to 40
are specific to architecture 1. The control neural layer of Unstable Points 3 4 3 7 4 7 0 0
this architecture is created in line 23 and is composed by
rnd_mem_zrizro neurons – the first X of the architecture Table 4. Results for unknown points
1 column in Table 2 represents this characteristic. Note Architecture 1 2 3 4 5 6 7 8
that the function input_controler_left receives this neural Average Error 28,12 3,77 43,181,77 9,11 8,02 14,46 -3,58
layer as a parameter (line 8 of Figure 9). Unstable Points 0 3 9 7 8 5 0 0
The control neural layer receives input from a
right_minus_left filter only, and each of its neurons has
32 gaussian distributed connections to this filter output.
7. Discussion
This is specified by lines 36 to 38 of Figure 9 and, in In our system, vergence control is achieved using data
Table 2, represented by the 32G in the architecture 1 coming from most of the input images pixels but with a
column. Lines 27, 28, 30 and 31 allow monitoring the greater attention devoted to the fovea region. Our system
control neural layer and the right_minus_left filter allows calculating the distance of different points of
outputs.
objects in the 3D space by moving the dominant camera’s towards implement a WNN system capable of emulating
fovea to it and calculating the vergence angle after the the human biological visual system capacity for creating a
system achieves vergence. With this information and the steady internal representation of the environment.
fovea image of parts of the 3D objects, we believe it is
possible to build a representation of these objects internal References
to the system. However, we are still a long way from that,
since we first need to build the parts of the system capable [1] E. R. Kandel, J. H. Schwartz, T. M. Jessell, “Principles of
of: choosing the interesting points of the 3D objects Neural Science, 4th Edition”, Prentice-Hall International, 2000.
(saccadic movement control), recognizing these [2] S. M. Ebenholtz, “Oculomotor Systems and Perception”,
Cambridge University Press, 2001.
interesting points, building a 3D representation of the
[3] T. B. Ludermir, A. Carvalho, A. P. Braga, and M. C. P.
objects, and grouping all these objects in a consistent
Souto, “Weightless Neural Models: A Review of Current and
internal representation of the 3D scene.
Past Works”, Neural Computing Surveys 2, p. 41-61, 1999.
Several works dealing with vergence in the context of [4] A. P. Braga, A. P. L. F. Carvalho e T. B. Ludermir,
binocular tracking can be found in the literature [15, 16, “Fundamentos de Redes Neurais Artificiais”, Rio de Janerio:
17]. Our work is similar to these in various aspects; DCC/IM, COPPE/Sistemas, NCE/UFRJ, 11ª Escola de
however, to our knowledge, there is no similar work in Computação, 1998.
the literature using WNN to control vergence. [5] F. Gonzalez and R. Peres, “Neural Mechanisms Underlying
Stereoscopic Vision”, Progress in Neurobiology, Elsevier
Science Ltd., Vol 55, pp. 191-224, 1998.
[6] J. Semmlow, G. Hung, J. Horng and K. Ciuffreda, “Disparity
Vergence Eye Movements Exhibit Preprogrammed Motor
Control”, Vision Research, 34(10): 1335-1343, 1994.
[7] I. Aleksander, “Self-Adaptative Universal Logic Circuits”,
IEE Electronic Letters, 2:231, 1966.
[8] I. Aleksander, “Ideal Neurons for Neural Computers”, In R.
Eckmiller, G. Hartmann and G. Hauske - editors, Parallel
Processing in Neural Systems and Computers, pp. 225-228,
Elsevier Science, 1989.
[9] J. Mrsic-Flogel, “Approaching Cognitive System Design”,
Proceedings of the International Conference on Artificial Neural
Networks (ICANN’91), Vol. 1, pp. 879-883, 1991.
[10] James Austin (editor), “RAM-Based Neural Networks”,
World Scientific, 1998.
[11] Novel Technical Solutions, “NRM: Neural Representation
Modeller Version 1.2.01”, Novel Technical Solutions, 1998.
Figure 10. The best architecture http://www.sonnet.co.uk/nts
[12] R. W. Rodieck, “Quantitative Analysis of Cat Retinal
8. Conclusion Ganglion Cell Response to Visual Stimuli”, Vision Res. 5: 583-
601, 1965.
This paper presents the methodology we have used to
[13] R. C. Gonzalez and R. E. Woods, “Digital Image
develop a vergence control system for an artificial stereo
Processing”, Addison-Wesley Publishing Company, Inc., 1992.
vision system based on Weightless Neural Networks [14] T. McReynolds and D. Blythe, “Advanced Graphics
(WNNs). The system was developed using a tool we have Programming Techniques Using OpenGL”, SIGGRAPH’99
created for modeling WNN architectures, named Event Course, 1999, Available at
Associative Machine (MAE). We have used current http://www.opengl.org/developers/code/sig99/advanced99/notes
knowledge about the human biological vision system to /notes.html
devise several architectural models of vergence control [15] A. Bernardino and J. Santos-Victor, “Visual Behaviors for
systems. Our best architecture produces vergence Binocular Tracking”, Robotics and Autonomous Systems 25,
movements with average error of only 3.6 image pixels, Elsevier, 1998.
which is equivalent to an angular error of approximately [16] M. M. Marefat and L. Wu, “Purposeful Gazing and
0.615o. Vergence Control for Active Vision”, Robotics & Computer-
As future work, we plan to use real cameras to test our Integrated Manufacturing, Vol. 12, No. 2, pp. 135-155, 1996.
system, and to develop a tool to create and test neural [17] N. Kita, S. Rougeaux, Y. Kuniyoshi, and F. Chavand,
architectures automatically using genetic programming “Binocular Tracking Based on Virtual Horopters”, Proceedings
techniques. This tool will allow searching the best of IROS´94, pp. 2052-2057, 1994.
parameters of our WNN architectures in a straightforward
way and, we believe, will be import for our research

You might also like