Professional Documents
Culture Documents
Daniel O’Shea
djoshea@stanford.edu
11 March 2010
Overview
Project Prakash
Learning to Recognize Objects
Subjects
Conclusion
Motion Information as a Scaffold
Experimental and Theoretical Evidence for Temporal Learning
Future Directions
Project Prakash
Project Prakash
Project Prakash
Scientific Opportunity
Learning to Recognize
Fig. 1. Example illustrating how a natural image (a) is typically a collection of many regions of different hues
and luminances (b). The human visual system has to accomplish the task of integrating subsets of these regions
into coherent objects, illustrated in (c).
Central questions
Recovery Subjects
Control Subjects
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
8 / 19
Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
8 / 19
Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
8 / 19
Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
8 / 19
Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
8 / 19
Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
8 / 19
Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
a
A B C D E F G
How many How many How many How many Which object Trace the How many
objects? objects? objects? objects? is in front? long curve objects?
b
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
NA NA NA NA
0
A B C D E F G
c d
b
100 Control
Group
Performance (% correct)
I Overfragmentated percept evident in object tracing S.K.
J.A.
I Segmentation consistent with algorithm based on
50
P.B.
NA NA NA NA
0
A B C D E F G
c d
Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg-
mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates
that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently
treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to
recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that
agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes.
100 Control
Group
Performance (% correct)
S.K.
J.A.
P.B.
50
0 NA NA NA NA
A B C D
Fig. 4. Recently treated and control subjects’ performance on tasks designed to assess the role of
dynamic information in object segregation. ‘‘NA’’ indicates that data are not available for a subject.
In the illustrations of the four tasks, the arrows indicate direction of movement.
Fig. 3. Motility ratings of the 50 images used to test object recognition and the recently treated subjects’ ability to recognize these images. Motility
ratings were obtained from 5 normally sighted respondents who were naive as to the purpose of the experiment; the height of the black bar below each
object indicates that object’s average rating on a scale from 1 (very unlikely to be seen in motion) to 5 (very likely to be seen in motion). The circles
indicate correct recognition responses.
Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 12 / 19
Overview Static Visual Parsing
Tests of Visual Parsing Dynamic Visual Parsing
Conclusion Recovery of Static Object Recognition
50
Fig. 5. The recently treated subjects’ performance on four tasks with static displays soon after
treatment and at follow-up testing after the passage of several months (indicated in the key).
Fig. 1. Experimental
design and predictions.
(A) IT responses were
tested in “Test phase”
(green boxes, see text),
which alternated with
“Exposure phase.” Each
exposure phase con-
sisted of 100 normal
exposures (50 P→P, 50
N→N) and 100 swap
exposures (50 P→N, 50
N→P). Stimulus size was
1.5° (16). (B) Each box
shows the exposure-
phase design for a sin-
gle neuron. Arrows show
the saccade-induced tem-
poral contiguity of reti-
nal images (arrowheads
point to the retinal im-
ages occurring later in
time, i.e., at the end of
the saccade). The swap
position was strictly alternated (neuron-by-neuron) so that it was counter- object images (here P and N). Thus, the predicted effect is a decrease in object
balanced across neurons. (C) Prediction for responses collected in the test phase: selectivity at the swap position that increases with increasing exposure (in the
If the visual system builds tolerance using temporal contiguity (here driven by limit, reversing object preference), and little or no change in object selectivity at
saccades), the swap exposure should cause incorrect grouping of two different the non-swap position.
Li and DiCarlo. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 2008.
Fig. 2. Model architecture and stimuli. An input image is fed into the hierarchical
network. The circles in each layer denote the overlapping receptive fields for the SFA-
Franzius, Wilbert, Wiskott. ICANN 2008.
nodes and converge towards the top layer. The same set of steps is applied on each
Daniel
layer, which is O’Shea (djoshea@stanford.edu)
visualized Visual Parsing After Recovery from Blindness
on the right hand side. 16 / 19
Overview Motion Information as a Scaffold
Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning
Conclusion Future Directions
Fig. 11. The examples for which color, multiresolution SANE most outperformed cgtg Martin with features radius 5 using match distance 3 and vice-versa.
the Martin detectors in any comparison to SANE. instance of the same object class. Unlike SANE, which segments
The results in Figure 10 make it clear that SANE outperforms each image independently, LOCUS jointly segments a collection
Martin on these data sets. The examples in Figure 11 illustrate that of images, allowing information in any image to influence the
SANE’s advantage lies in its higher precision. While the Martin segmentation of the others.
detectors have little trouble finding object boundaries, they also LOCUS provides a good comparison to SANE. Its shape
detect
Ross andmany other non-object
Kaelbling. Segmentationboundaries
Accordingin to
theNatural
process.Examples: model,
In all fourLearning which
Static includes a from
Segmentation probabilistic template describing
Motion Segmentation. IEEE the
data sets, the color, multiresolution SANE f-measures outperform expected overall shape of the objects, is much more global than
all Martin detectors at all feature radii using the strict
Transactions on Pattern Analysis and Machine Intelligence 2008. matching SANE’s shape model, which learns the relationships between
distance of 1. As the match maximum distance relaxes to 3 and 5, neighboring boundary segments. Furthermore, LOCUS does not
the Martin results improve, especially their recall. At radius 3, the attempt to learn a common image model, while SANE segments
best Martin results only O’Shea
Daniel match SANE new images
on the traffic data. At radius Visual
(djoshea@stanford.edu) usingAfter
Parsing previously learned
Recovery models
from of the image prop-
Blindness 17 / 19
Overview Motion Information as a Scaffold
Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning
Conclusion Future Directions
Structure from Motion where and the measurements whose assignments will
be swapped, and and are the projections of the fea-
tures originally assigned to them .
To conclude the E-step and compute the virtual measure-
ments in (11), the only thing left to do is to compute the
I Recover 3D structure and camera motion from multiple
marginal probabilities from the sample . Fortu-
nately, this can be done without explicitly storing the sam-
Figure 4. Three out of 11 cube images. Although the
images without correspondence
ples by keeping running counts of how many times each
measurement is assigned to feature , and use that to
images were originally taken as a sequence in time, the
ordering of the images is irrelevant to our method.
compute . If we define to be this count, we have:
(15)