Professional Documents
Culture Documents
1. Introduction
Our visual perception of the world is egocentric, i.e.,
centered in our perceived position in relation to the
environment. Normally, we do not even notice that. It
would be awful, however, if our perception were
oculocentric, i.e., centered in the position of our eyes in
relation to the environment. In such case, we would
perceive the world moving with the movements of our
eyes. Nevertheless, in an apparent paradox, the visual
brain areas that first analyze the attributes of the visual
brain input possess an accurate oculocentric neural Figure 1. Simple neural architecture
organization [1]. Still, when moving our eyes, we feel that
In our implementation, images captured by virtual
the exterior world is steady. This shows that the human
cameras are used for controlling the position of these
visual perception depends not only on the image on the
cameras’ “fovea”. The control signals for positioning the
retina, but also on the knowledge of the position of the
foveae are generated by WNNs, which receive as input
eyes, head, and body. Therefore, the oculomotor system
the images captured by the cameras. To implement this
influences the visual perception and information
vergence system, we have used current knowledge about
regarding its status is important for visual perception of
the neurophysiology of the biological visual system and a
the world [2].
software tool we have developed to emulate WNNs
This paper presents the first step of a large research
architectures. This tool, named MAE (Portuguese
effort whose main objective is to implement a system,
acronym for Event Associative Machine), allows
based on artificial Weightless Neural Networks (WNNs
simulation of complex WNN architectures with hundreds
[3, 4, 10]), capable of emulating the human biological
of thousands of neurons in a standard PC running Linux.
visual system capacity for creating a steady internal
An example architecture of vergence control system we
representation of the environment. We believe that the
have studied using MAE is shown in Figure 1, while
oculomotor system plays a significant role on building
Figure 2 shows a screen shot of MAE running the neural
this internal representation, since it is responsible for the
architecture of Figure 1.
In Figure 1, Image_left and Image_right are images identified throughout most parts of the brain devoted to
captured by two virtual cameras positioned in distinct visual processing [1].
places in the virtual space to emulate the physical
separation of the eyes. The symbol ƒ denotes two- 2.1. Visual Pathways
dimension filters, denotes two dimension neural layers The optical nerve connects to the lateral geniculate
and denotes outputs that allow the MAE user to nucleus (LGN) of the thalamus and this to the visual
observe the status of a neural layer. cortex. The visual cortex is usually divided into 5 separate
areas, V1-V5 [1]; the LNG connects to the primary visual
cortex, or V1 (also called striate cortex, Figure 3). V1
possesses cells that detect several basic attributes of the
visual stimuli. There are neurons that detect stimulus that
have a certain angular orientation, others are sensitive to
color, others to disparities between stimulus coming from
each eye, etc.
From V1, the visual neuronal impulses are directed to
V2. V2 projects to V3, V4, and V5, and each of these
areas sends information to any of 20 or more other areas
of the brain that process visual information. This general
arrangement can be subdivided into three parallel
pathways, which start in the parvocellular and
magnocellular cells of the retina (in V1, the parvocellular
pathway is divided into two, parvocellular interblob and
parvocellular blob, as can be seen in Figure 4). Although
Figure 2. MAE screen shot of a simple neural each pathway is somewhat distinct in function, there is
architecture intercommunication between them.
The magnocellular cortical pathway starts in the layer
We have implemented and tested several neural
4Cα of V1 (Figure 4). This projects to layer 4B, which
architectures like the one shown in Figure 1, using filters
then projects to the thick stripes of V2. Area V2 then
and trained neural layers to emulate neurophysiological
projects to V3, V5 (middle-temporal cortex or MT), and
properties observed in the human biological visual
to the medial superior temporal cortex (MST). This
system. Our best performing architecture produces
pathway is an extension of the magnocellular pathway
vergence movements with average error of only 3.6 image
from the retina and LGN, and processes visual detail
pixels, which is equivalent to an angular error of
leading to the perception of shape in area V3, and
approximately 0.615o.
movement or motion in areas V5 and MST. Retina’s
magnocellular ganglion neurons show a high sensitivity to
contrast, low spatial resolution, and high temporal
resolution or fast transient responses to visual stimuli.
These characteristics make the magnocellular branch of
the visual system especially suitable to quickly detect
novel or moving stimuli.
In the second visual pathway, neurons of the
parvocellular interblob areas of V1 project to the pale
stripes of V2 and these to the inferior temporal cortex
Figure 3. Eye, LGN, V1 (Figure 4). This pathway possesses several feature
detectors (simple, complex and hypercomplex cells) and
2. Biological Vision is responsible for complex form recognition.
Finally, in the third and mixed visual cortical pathway,
Vision starts with two external sensors, the eyes. The neurons in the blobs of V1 project to the thin stripes of
retina, situated at the back of the eyeball, is where the V2. The thin stripes of V2 then project to V4. Area V4
image of the outside world is captured. The neurons of the receives input from both the parvo- and magnocellular
retina transform light energy into electrical signals that pathways of the retina and LGN. This parvocellular
are transmitted to the brain via the optic nerve. The retina portion of V4 is particularly important for the perception
ganglion cells, which drive the optical nerve, can be of color and maintenance of color perception regardless of
separated into two categories: parvocellular (small cells) lighting (color constancy).
and magnocellular (large cells). These two categories of
cells are the starting point of parallel pathways that can be
Retina’s parvocellular ganglion neurons show a low fifth – vergence – is disconjugate: the eyes move in
sensitivity to contrast, high spatial resolution, and low different directions and sometimes by different amounts.
temporal resolution or sustained responses to visual Thanks to the vergence oculomotor subsystem, when
stimuli. These cellular characteristics make the we look to an object that is near or far from us, each eye
parvocellular visual pathways especially suitable for the moves differently to keep the image of the object aligned
analysis of detail in the visual world. precisely on each fovea. If the object is near, the eyes
These separated parallel pathways decode/extract the converge; if the object is far, the eyes diverge. The
elementary properties of the objects in the visual field, difference in retinal position of an object in one eye
such as color, form and orientation, producing a rather compared to its position in the other is referred to as
complex internal representation of what we see. However, retinal disparity, and the detection of this disparity is
this representation is not captured instantly: we examine important for vergence control.
our surroundings all the time, focusing our attention in
different parts of the field of view through the movement Table 1. A Functional Classification of Eye
our eyes. We believe that the mind creates this internal Movements
representation in a way similar to a painter working on his Movement Function
canvas, where the eyes serve as the painter’s brush. Movements that stabilize the eye when the head moves
Vestibulo-ocular for brief or rapid head rotation
Optokinetic for sustained or slow head rotation
Movements that keep the fovea on a visual target
Saccade Bring new objects of interest onto the fovea
Smooth Pursuit Holds the image of a moving target on the
fovea
Vergence Adjusts the eyes for different viewing
distances in depth
6. Experimental Results
1. set NEURON_MEMORY_SIZE = 32;
2.
3. input image_right[512][512] with greyscale outputs
4.
5.
produced by input_generator()
controled by input_controler_right(); Table 3 and Table 4 summarize the experimental
6.
7.
input image_left[512][512] with greyscale outputs
produced by input_generator()
results obtained with the 8 architectures represented in
8. controled by input_controler_left(control_out); Table 2, according to the performance parameters
9.
10. neuronlayer gauss_right[64][64] with greyscale outputs; presented in Subsection 5.6. We have tested the
11.
12.
neuronlayer gauss_left[64][64] with greyscale outputs; architectures with the points used during training (Table
13. filter image_right with gauss_filter()producing gauss_right; 3) and with unknown points (Table 4). The best results in
14. filter image_left with gauss_filter() producing gauss_left;
15. Table 3 and Table 4 are shown in gray.
16.
17.
output gauss_right_out[64][64];
output gauss_left_out[64][64];
By comparing Table 3 and Table 4, one can conclude
18. that the architecture 8 presents the best results, showing
19. outputconnect gauss_right to gauss_right_out;
20. outputconnect gauss_left to gauss_left_out; no unstable points for known and unknown points, the
21.
22.
smaller average error for known points and the runner-up
23. neuronlayer control[64][64] of rnd_mem_zrizro neurons average error for unknown points. Our best performing
24. with b&w outputs;
25. neuronlayer right_minus_left[64][64] with greyscale outputs; architecture produces vergence movements with average
26.
27. output right_minus_left_out[64][64]; error of only 3.6 image pixels, which is equivalent to an
28.
29.
output control_out[64][64]; angular error of approximately 0.615o.
30. outputconnect right_minus_left to right_minus_left_out; The model of our best architecture is depicted in
31.
32.
outputconnect control[@][@] to control_out;
Figure 10. This architecture uses default neurons
33.
34
filter gauss_right, gauss_left with minus_filter()
producing right_minus_left;
randomly connected to four filters outputs:
35. right_minus_left, with 32 connections per neuron; and the
TE, TN and TF filters with 16 connections per neuron (a
36. connect right_minus_left to control
37. with 32 random inputs per neuron and
38.
39.
gaussian distribution with radius 3.0; total of 32+3x16, or 80 connections per neuron).
40. associate control with control; Architecture 8 has Set2Layers as motor control strategy.
Figure 9. NADL specification of Figure 1 architecture
Lines 1 to 20 of Figure 9 are common to all Table 3. Results for known points
Architecture 1 2 3 4 5 6 7 8
architectures we have implemented and describe an
Average Error 19,20 23,59 4,48 11,64 10,02 -3,84 3,730,26
architecture up to its 2 retinas. However, lines 23 to 40
are specific to architecture 1. The control neural layer of Unstable Points 3 4 3 7 4 7 0 0
this architecture is created in line 23 and is composed by
rnd_mem_zrizro neurons – the first X of the architecture Table 4. Results for unknown points
1 column in Table 2 represents this characteristic. Note Architecture 1 2 3 4 5 6 7 8
that the function input_controler_left receives this neural Average Error 28,12 3,77 43,181,77 9,11 8,02 14,46 -3,58
layer as a parameter (line 8 of Figure 9). Unstable Points 0 3 9 7 8 5 0 0
The control neural layer receives input from a
right_minus_left filter only, and each of its neurons has
32 gaussian distributed connections to this filter output.
7. Discussion
This is specified by lines 36 to 38 of Figure 9 and, in In our system, vergence control is achieved using data
Table 2, represented by the 32G in the architecture 1 coming from most of the input images pixels but with a
column. Lines 27, 28, 30 and 31 allow monitoring the greater attention devoted to the fovea region. Our system
control neural layer and the right_minus_left filter allows calculating the distance of different points of
outputs.
objects in the 3D space by moving the dominant camera’s towards implement a WNN system capable of emulating
fovea to it and calculating the vergence angle after the the human biological visual system capacity for creating a
system achieves vergence. With this information and the steady internal representation of the environment.
fovea image of parts of the 3D objects, we believe it is
possible to build a representation of these objects internal References
to the system. However, we are still a long way from that,
since we first need to build the parts of the system capable [1] E. R. Kandel, J. H. Schwartz, T. M. Jessell, “Principles of
of: choosing the interesting points of the 3D objects Neural Science, 4th Edition”, Prentice-Hall International, 2000.
(saccadic movement control), recognizing these [2] S. M. Ebenholtz, “Oculomotor Systems and Perception”,
Cambridge University Press, 2001.
interesting points, building a 3D representation of the
[3] T. B. Ludermir, A. Carvalho, A. P. Braga, and M. C. P.
objects, and grouping all these objects in a consistent
Souto, “Weightless Neural Models: A Review of Current and
internal representation of the 3D scene.
Past Works”, Neural Computing Surveys 2, p. 41-61, 1999.
Several works dealing with vergence in the context of [4] A. P. Braga, A. P. L. F. Carvalho e T. B. Ludermir,
binocular tracking can be found in the literature [15, 16, “Fundamentos de Redes Neurais Artificiais”, Rio de Janerio:
17]. Our work is similar to these in various aspects; DCC/IM, COPPE/Sistemas, NCE/UFRJ, 11ª Escola de
however, to our knowledge, there is no similar work in Computação, 1998.
the literature using WNN to control vergence. [5] F. Gonzalez and R. Peres, “Neural Mechanisms Underlying
Stereoscopic Vision”, Progress in Neurobiology, Elsevier
Science Ltd., Vol 55, pp. 191-224, 1998.
[6] J. Semmlow, G. Hung, J. Horng and K. Ciuffreda, “Disparity
Vergence Eye Movements Exhibit Preprogrammed Motor
Control”, Vision Research, 34(10): 1335-1343, 1994.
[7] I. Aleksander, “Self-Adaptative Universal Logic Circuits”,
IEE Electronic Letters, 2:231, 1966.
[8] I. Aleksander, “Ideal Neurons for Neural Computers”, In R.
Eckmiller, G. Hartmann and G. Hauske - editors, Parallel
Processing in Neural Systems and Computers, pp. 225-228,
Elsevier Science, 1989.
[9] J. Mrsic-Flogel, “Approaching Cognitive System Design”,
Proceedings of the International Conference on Artificial Neural
Networks (ICANN’91), Vol. 1, pp. 879-883, 1991.
[10] James Austin (editor), “RAM-Based Neural Networks”,
World Scientific, 1998.
[11] Novel Technical Solutions, “NRM: Neural Representation
Modeller Version 1.2.01”, Novel Technical Solutions, 1998.
Figure 10. The best architecture http://www.sonnet.co.uk/nts
[12] R. W. Rodieck, “Quantitative Analysis of Cat Retinal
8. Conclusion Ganglion Cell Response to Visual Stimuli”, Vision Res. 5: 583-
601, 1965.
This paper presents the methodology we have used to
[13] R. C. Gonzalez and R. E. Woods, “Digital Image
develop a vergence control system for an artificial stereo
Processing”, Addison-Wesley Publishing Company, Inc., 1992.
vision system based on Weightless Neural Networks [14] T. McReynolds and D. Blythe, “Advanced Graphics
(WNNs). The system was developed using a tool we have Programming Techniques Using OpenGL”, SIGGRAPH’99
created for modeling WNN architectures, named Event Course, 1999, Available at
Associative Machine (MAE). We have used current http://www.opengl.org/developers/code/sig99/advanced99/notes
knowledge about the human biological vision system to /notes.html
devise several architectural models of vergence control [15] A. Bernardino and J. Santos-Victor, “Visual Behaviors for
systems. Our best architecture produces vergence Binocular Tracking”, Robotics and Autonomous Systems 25,
movements with average error of only 3.6 image pixels, Elsevier, 1998.
which is equivalent to an angular error of approximately [16] M. M. Marefat and L. Wu, “Purposeful Gazing and
0.615o. Vergence Control for Active Vision”, Robotics & Computer-
As future work, we plan to use real cameras to test our Integrated Manufacturing, Vol. 12, No. 2, pp. 135-155, 1996.
system, and to develop a tool to create and test neural [17] N. Kita, S. Rougeaux, Y. Kuniyoshi, and F. Chavand,
architectures automatically using genetic programming “Binocular Tracking Based on Virtual Horopters”, Proceedings
techniques. This tool will allow searching the best of IROS´94, pp. 2052-2057, 1994.
parameters of our WNN architectures in a straightforward
way and, we believe, will be import for our research