Visual Parsing After Recovery From Blindness

Overview
Tests of Visual Parsing

Conclusion
Visual Parsing After Recovery from Blindness

Ostrovsky et al. Psychological Science 2009.
Daniel O’Shea
djoshea@stanford.edu
Brain CS Lunch Talk
11 March 2010
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 1 / 19

Overview
Conclusion
Overview
Project Prakash
Learning to Recognize Objects
Subjects

Static Visual Parsing
Dynamic Visual Parsing
Recovery of Static Object Recognition
Conclusion
Motion Information as a Scaffold
Experimental and Theoretical Evidence for Temporal Learning
Future Directions

Overview Project Prakash
Tests of Visual Parsing Learning to Recognize Objects
Conclusion Subjects
Project Prakash
“The goal of Project Prakash is to bring light into the

lives of curably blind children and, in so doing, illuminate
some of the most fundamental scientific questions about
how the brain develops and learns to see.”
http://web.mit.edu/bcs/sinha/prakash.html

Conclusion Subjects
Project Prakash
I Launched in 2003 by Pawan Sinha at MIT BCS

I Based in New Delhi; India home to 30% of blind population
I Humanitarian aim: Screen children for treatable blindness
and provide modern treatment

Conclusion Subjects
Project Prakash
I Launched in 2003 by Pawan Sinha at MIT BCS

I Based in New Delhi; India home to 30% of blind population
I Humanitarian aim: Screen children for treatable blindness
and provide modern treatment
Scientific Opportunity
I How does the visual system learn to perceive objects?

I Track object learning in longitudinal studies
I Complementary to infant studies which have more plastic
visual processing but are limited in experimental complexity
I Separation of visual learning from sensory development

Conclusion Subjects
Learning to Recognize
I Visual system must learn to “parse” visual world into

meaningful entities
I Segmentation focused on heuristics:
Y. Ostrovsky et al. alignment of contours,
similarity of texture representations
Fig. 1. Example illustrating how a natural image (a) is typically a collection of many regions of different hues
and luminances (b). The human visual system has to accomplish the task of integrating subsets of these regions
into coherent objects, illustrated in (c).

Conclusion Subjects
Central questions
I Critical period for learning visual parsing?

I Evidence that mature visual system uses these heuristics for
object recognition, but are they useful in organizing visual
information during development?

Conclusion Subjects
Recovery Subjects
I S.K. 29 yo, congenital bilateral aphakia (absence of lens);

corrected with eyeglasses!
I P.B. 7 yo, treated with bilateral cataract surgery with
intraocular lens implantation
I J.A. 13 yo, treated with bilateral cataract surgery with
intraocular lens implantation
Control Subjects
I Visual input scaled to simulate information loss due to

imperfect acuity in treated group (20/80 to 20/120)

Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
Tests of Static Visual Parsing

Y. Ostrovsky et al.
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I A: Non-overlapping shapes: No difficulty.
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
8 / 19

Y. Ostrovsky et al.
a
A B C D E F G
b
100 Control
Group
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I B: Overlapping shapes as line drawings: overfragmentation,

all closed loops perceived as distinct.
8 / 19

Y. Ostrovsky et al.
a
A B C D E F G
b
100 Control
Group
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I C: Overlapping shapes as filled translucent surfaces:

overfragmentation, all regions of constant luminance
perceived as distinct
8 / 19

Y. Ostrovsky et al.
a
A B C D E F G
b
100 Control
Group
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I D: Opaque overlapping shapes: could correctly indicate

number of objects.
8 / 19

Y. Ostrovsky et al.
a
A B C D E F G
b
100 Control
Group
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I E: Opaque overlapping shapes: could not indicate depth

ordering (chance performance)
8 / 19

Y. Ostrovsky et al.
a
A B C D E F G
b
100 Control
Group
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I F: Field of oriented line segments: difficulty finding path of

coherent segments
8 / 19

Y. Ostrovsky et al.
a
A B C D E F G
b
100 Control
Group
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I G: 3D shapes with luminance from lighting/shadows:

percept of multiple objects, one for each facet
8 / 19

Y. Ostrovsky et al.
a
A B C D E F G
b
100 Control
Group
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
I Conclusion: Tendency to overfragment driven by low-level

cues (hue and luminance regions). Inability to use cues of
contour continuation, junction structure, figural symmetry.
8 / 19
a Tests of Visual Parsing Dynamic Visual Parsing
A B C D E F G
Object Tracing How many

objects?
How many
objects?
How many
objects?
How many
objects?
Which object Trace the
is in front? long curve
How many
objects?
b
100 Control
Group
I Overfragmentated percept evident in object tracing S.K.
J.A.
I Segmentation consistent with algorithm based on
50
P.B.
hue/luminance similarity metric
NA NA NA NA
0
A B C D E F G
c d

Dynamic Visual Input
I Natural visual experience is dynamic, includes motion

I Show individual shapes undergoing independent smooth
translational motion and study percept in a dynamic context

indicate correct recognition responses.
So far, we have described the Tests of Visual
recently Parsingper-
treated subjects’ Dynamic
allowed S.K. toVisual
better Parsing
perceive shapes embedded in noise
formance with static imagery. In order to makeConclusion
our experiments Recovery
(Fig. of Static
4, Tests C and Object
D). Motion Recognition
thus appeared to be instrumental
more representative of everyday visual experience, which typ- for enabling the recently treated subjects to link together parts
ically involves dynamic inputs, we used a set of stimuli that of an object and segregate them from the background.
Dynamic Visual Input
incorporated motion cues (Fig. 4, Tests A and B). The task was
the same as for the tests of static visual parsing—to indicate the
The recently treated subjects’ recognition results with real-
world images, already summarized, provide evidence of another
number of objects shown. The individual shapes underwent role that motion might play in their object perception skills.
independent smooth translational motion. For overlapping fig- When we examined which images the recently treated subjects
ures, the movement was constrained such that the overlap was were able to recognize, an interesting pattern became evident.
I Motion facilitates object binding (linking together the parts)
maintained at all times. As illustrated in Figure 3, the recently treated subjects’ recog-
The inclusion of motion brought about a dramatic change in nition responses showed a significant congruence with inde-
and segregation from background in the presence of noise
the recently treated subjects’ responses. As Figure 4 indicates, pendently derived motility ratings of the objects depicted in the
they responded correctly on a majority of the trials. Motion also images (p < .01 in a chi-square test for each of the 3 subjects). A
100 Control
Group
S.K.
J.A.
P.B.
50
0 NA NA NA NA
A B C D
How many How many Name the Name the

objects? objects? object object
Fig. 4. Recently treated and control subjects’ performance on tasks designed to assess the role of
dynamic information in object segregation. ‘‘NA’’ indicates that data are not available for a subject.
In the illustrations of the four tasks, the arrows indicate direction of movement.

Static Recognition and Object Motility
I More “motile” objects more likely to be recognized

I Object motion (alongside low-level cues) binds regions into a
cohesive whole; learning of this cohesive structure facilitates
recognition in a static context
Visual Parsing After Recovery From Blindness
Fig. 3. Motility ratings of the 50 images used to test object recognition and the recently treated subjects’ ability to recognize these images. Motility
ratings were obtained from 5 normally sighted respondents who were naive as to the purpose of the experiment; the height of the black bar below each
object indicates that object’s average rating on a scale from 1 (very unlikely to be seen in motion) to 5 (very likely to be seen in motion). The circles
indicate correct recognition responses.
Follow-up Test of Static Visual Parsing

I S.K. (18 mo), P.B. (10 mo), and J.A. (12 mo) registered
significant improvement in static visual parsing
I Critical period for visual learning should not be strictly
applied in predicting therapeutic
Y. Ostrovsky et al. gains
100 S.K. (1 mo.) S.K. (18 mo.)

J.A. (3 mo.) J.A. (12 mo.)
P.B. (1 mo.) P.B. (10 mo.)
50
How many Name the Trace the How many

objects? object long curve objects?
Fig. 5. The recently treated subjects’ performance on four tasks with static displays soon after
treatment and at follow-up testing after the passage of several months (indicated in the key).

Overview Motion Information as a Scaffold
Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning
Conclusion Future Directions
Motion Information Available Early
I Motion sensitivity develops early in primate brain

I Segmentation from motion 2 months prior to segmentation
from static cues in infants
I Infants link separated parts of partially occluded object
through common motion

Motion Information Available Early
I Motion sensitivity develops early in primate brain

I Segmentation from motion 2 months prior to segmentation
from static cues in infants
I Infants link separated parts of partially occluded object
through common motion
Motion Information as a Scaffold

I Motion allows early grouping of parts
I Correlation of motion-based grouping and static cues (e.g.
aligned contours) facilitates grouping via static cues

To look for such a signature, we focused on obtained in these test phases: animal tasks un- advantage of the fact that primates are effectively
Downloaded from www.sciencemag.org on March 9, 20

position tolerance. If two objects consistently Overview
related to the test stimuli; Motionblind
no attentional cueing; Information
during the briefas a itScaffold
time takes to complete a
swapped identity across temporally contiguous and completely
Tests of Visualrandomized,
Parsing brief presentations saccade (18).
Experimental andIt consistently
Theoretical madeEvidence
the image of for Temporal Learning
changes in retinal position then, after sufficient of test stimuli (16). We alternated
Conclusion between these one object
Future Directions at a peripheral retinal position (swap
experience in this “altered” visual world, the two phases (test phase ~5 min; exposure phase position) temporally contiguous with the retinal
visual system might incorrectly associate the ~15 min) until neuronal isolation was lost. image of the other object at the center of the
neural representations of those objects viewed at To create the altered visual world (“Exposure retina (Fig. 1).
Temporal Contiguity and IT cortex
different positions into a single object representa-
tion (12, 13). We focused on the top level of the
phase” in Fig. 1A), each monkey freely viewed
the video monitor on which isolated objects
We recorded from 101 IT neurons while the
monkeys were exposed to this altered visual
primate ventral visual stream, the inferior tempo- appeared intermittently, and its only task was to world (isolation held for at least two test phases;
ral cortex (IT), where many individual neurons freely look at each object. This exposure “task” is n = 50 in monkey 1; 51 in monkey 2). For each
a natural, automatic primate behavior in that it neuron, we measured its object selectivity at each
requires no training. However, by means of real- position as the difference in response to the two
I Disruption of temporal contiguity of visual experience alters
McGovern Institute for Brain Research and Department of
Brain and Cognitive Sciences, Massachusetts Institute of time eye-tracking (17), the images that played out objects (P − N; all key effects were also found
Technology, Cambridge, MA 02139, USA. on the monkey’s retina during exploration of this with a contrast index of selectivity) (fig. S6). We
object specificity of IT neurons
*To whom correspondence should be addressed. E-mail:
dicarlo@mit.edu
world were under precise experimental control
(16). The objects were placed on the video
found that, at the swap position, IT neurons (on
average) decreased their initial object selectivity
Fig. 1. Experimental
design and predictions.
(A) IT responses were
tested in “Test phase”
(green boxes, see text),
which alternated with
“Exposure phase.” Each
exposure phase con-
sisted of 100 normal
exposures (50 P→P, 50
N→N) and 100 swap
exposures (50 P→N, 50
N→P). Stimulus size was
1.5° (16). (B) Each box
shows the exposure-
phase design for a sin-
gle neuron. Arrows show
the saccade-induced tem-
poral contiguity of reti-
nal images (arrowheads
point to the retinal im-
ages occurring later in
time, i.e., at the end of
the saccade). The swap
position was strictly alternated (neuron-by-neuron) so that it was counter- object images (here P and N). Thus, the predicted effect is a decrease in object
balanced across neurons. (C) Prediction for responses collected in the test phase: selectivity at the swap position that increases with increasing exposure (in the
If the visual system builds tolerance using temporal contiguity (here driven by limit, reversing object preference), and little or no change in object selectivity at
saccades), the swap exposure should cause incorrect grouping of two different the non-swap position.
Li and DiCarlo. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 2008.
SCIENCE VOL 321 1503

Danielwww.sciencemag.org
O’Shea (djoshea@stanford.edu) 12 SEPTEMBER 2008
Visual Parsing After Recovery from Blindness 15 / 19
Invariant Object Recognition with Slow Feature Analysis
I Slowness principle can be used to learn independent

representations of object position, angle, and identity from
966smoothly moving
M. Franzius, stimuli
N. Wilbert, and L. Wiskott
Fig. 2. Model architecture and stimuli. An input image is fed into the hierarchical
network. The circles in each layer denote the overlapping receptive fields for the SFA-
Franzius, Wilbert, Wiskott. ICANN 2008.
nodes and converge towards the top layer. The same set of steps is applied on each
Daniel
layer, which is O’Shea (djoshea@stanford.edu)
visualized Visual Parsing After Recovery from Blindness
on the right hand side. 16 / 19
Learning Static Segmentation from Motion
I Segmentation According to Natural Examples (SANE)

11
SANE cgtg (5)

SANE better
Martin better
Fig. 11. The examples for which color, multiresolution SANE most outperformed cgtg Martin with features radius 5 using match distance 3 and vice-versa.
the Martin detectors in any comparison to SANE. instance of the same object class. Unlike SANE, which segments
The results in Figure 10 make it clear that SANE outperforms each image independently, LOCUS jointly segments a collection
Martin on these data sets. The examples in Figure 11 illustrate that of images, allowing information in any image to influence the
SANE’s advantage lies in its higher precision. While the Martin segmentation of the others.
detectors have little trouble finding object boundaries, they also LOCUS provides a good comparison to SANE. Its shape
detect
Ross andmany other non-object
Kaelbling. Segmentationboundaries
Accordingin to
theNatural
process.Examples: model,
In all fourLearning which
Static includes a from
Segmentation probabilistic template describing
Motion Segmentation. IEEE the
data sets, the color, multiresolution SANE f-measures outperform expected overall shape of the objects, is much more global than
all Martin detectors at all feature radii using the strict
Transactions on Pattern Analysis and Machine Intelligence 2008. matching SANE’s shape model, which learns the relationships between
distance of 1. As the match maximum distance relaxes to 3 and 5, neighboring boundary segments. Furthermore, LOCUS does not
the Martin results improve, especially their recall. At radius 3, the attempt to learn a common image model, while SANE segments
best Martin results only O’Shea
Daniel match SANE new images
on the traffic data. At radius Visual
(djoshea@stanford.edu) usingAfter
Parsing previously learned
Recovery models
from of the image prop-
Blindness 17 / 19
Structure from Motion where and the measurements whose assignments will
be swapped, and and are the projections of the fea-
tures originally assigned to them .
To conclude the E-step and compute the virtual measure-
ments in (11), the only thing left to do is to compute the
I Recover 3D structure and camera motion from multiple
marginal probabilities from the sample . Fortu-
nately, this can be done without explicitly storing the sam-
Figure 4. Three out of 11 cube images. Although the
images without correspondence
ples by keeping running counts of how many times each
measurement is assigned to feature , and use that to
images were originally taken as a sequence in time, the
ordering of the images is irrelevant to our method.
compute . If we define to be this count, we have:
(15)
4.3 Implementation in Practice

The pseudo-code for the final algorithm is as follows:
1. Generate an initial structure and motion estimate .

hose assignments will t=0 !=0.0 t=1 !=25.1 t=3 !=23.5
rojections of the fea- 2. Given and the data , run the Metropolis sam-
pler in each image to obtain approximate values for
e the virtual measure- the weights , using equation (15).
do is to compute the
sample . Fortu- 3. Calculate the virtual measurements with (11).
citly storing the sam-
Figure 4. Three
4. Findoutthe
of new
11 cube images. Although
estimate the and motion
for structure t=10 !=18.7 t=20 !=13.5 t=100 !=1.0
ow many times each
images were originally
using thetaken
virtualasmeasurements
a sequence in time,as
thedata. This can
ure , and use that to
ordering of the
beimages is irrelevant
done using to ourmethod
any SFM method.compatible with the Figure 5. The structure estimate as initialized and at suc-
this count, we have:
projection model assumed. cessive iterations of the algorithm.
(15) 5. If not converged, return to step 2.

time, with the annealing parameter decreasing exponen-
e To avoid getting stuck in local minima, it is important in
tially from 25 pixels to 1 pixel. For each EM iteration, we
practice to add annealing to this basic scheme. In annealing
Dellaert, Seitz, Thorpe, and Thrun. Structure ran the sampler in each image for 10000 Conference
steps. An entire
rithm is as follows: we artificially increase the noisefrom Motionfor
parameter without Correspondence.
the early IEEE Computer Society on Computer
run takes about a minute of CPU time on a standard PC. As
iterations, gradually decreasing it to its correct value. This
motion estimate . is typical for EM, the algorithm can sometimes get stuck in
Vision and Pattern has
!=0.0two beneficial
t=0 Recognition t=1 2000.consequences.
!=25.1 First, the posterior distri-
t=3 !=23.5
local minima, in which case we restart it manually.
the Metropolis sam- bution will be less peaked when is high, so that the
Metropolis sampler will explore the space of assignments In practice, the algorithm converges consistently and fast
proximate values for
more easily, and avoid getting stuck on islands of high to an estimate for the structure and motion where the correct
15).
correspondence is the most probable one, and where most if
Daniel O’Shea (djoshea@stanford.edu)
probability. Second, the expected log likelihood is Visual Parsing After Recovery from Blindness 18 / 19
Future Directions (Suggestions for Presentations. . . )
I Evidence for motion-based learning of object binding?

I Evidence from motion-alteration or
temporal-contiguity-altering experiments?
I Neuroscience and SFA?
I Bootstrapped learning of novel objects?

Visual Parsing After Recovery From Blindness

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Visual Parsing After Recovery From Blindness

Uploaded by

Copyright:

Available Formats

Overview

Tests of Visual Parsing

Visual Parsing After Recovery from Blindness

Brain CS Lunch Talk

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 1 / 19

Tests of Visual Parsing

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 2 / 19

“The goal of Project Prakash is to bring light into the

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 3 / 19

I Launched in 2003 by Pawan Sinha at MIT BCS

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 4 / 19

I Launched in 2003 by Pawan Sinha at MIT BCS

I How does the visual system learn to perceive objects?

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 4 / 19

I Visual system must learn to “parse” visual world into

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 5 / 19

I Critical period for learning visual parsing?

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 6 / 19

I S.K. 29 yo, congenital bilateral aphakia (absence of lens);

I Visual input scaled to simulate information loss due to

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 7 / 19

Tests of Static Visual Parsing

I A: Non-overlapping shapes: No difficulty.

Tests of Static Visual Parsing

I B: Overlapping shapes as line drawings: overfragmentation,

Tests of Static Visual Parsing

I C: Overlapping shapes as filled translucent surfaces:

Tests of Static Visual Parsing

I D: Opaque overlapping shapes: could correctly indicate

Tests of Static Visual Parsing

I E: Opaque overlapping shapes: could not indicate depth

Tests of Static Visual Parsing

I F: Field of oriented line segments: difficulty finding path of

Tests of Static Visual Parsing

I G: 3D shapes with luminance from lighting/shadows:

Tests of Static Visual Parsing

I Conclusion: Tendency to overfragment driven by low-level

Object Tracing How many

hue/luminance similarity metric

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 9 / 19

Dynamic Visual Input

I Natural visual experience is dynamic, includes motion

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 10 / 19

How many How many Name the Name the

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 11 / 19

Static Recognition and Object Motility

I More “motile” objects more likely to be recognized

Follow-up Test of Static Visual Parsing

100 S.K. (1 mo.) S.K. (18 mo.)

P.B. (1 mo.) P.B. (10 mo.)

How many Name the Trace the How many

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 13 / 19

Motion Information Available Early

I Motion sensitivity develops early in primate brain

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 14 / 19

Motion Information Available Early

I Motion sensitivity develops early in primate brain

Motion Information as a Scaffold

Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 14 / 19

Downloaded from www.sciencemag.org on March 9, 20

SCIENCE VOL 321 1503

Invariant Object Recognition with Slow Feature Analysis

I Slowness principle can be used to learn independent