Similarity

CogSci 131
Similarity, spaces, and features

Joshua Peterson
Admin
Problem Set 0 grades posted
Problem Set 1 due tomorrow
The rules and symbols idea

Thought is a formal system, on the model of
mathematical logic and computation
Languages are sets of sentences, characterized
by formal theories or grammars
(Learning is hard, and guided by strong innate
constraints that are domain-specific)
Part III: Networks, spaces, and features
Outline
Similarity and generalization
Break
Spaces vs. features
How can we explain typicality?

One answer: reject definitions, and have a
new representation for categories
Prototype theory:
categories are represented by a prototype
other members share a family resemblance
relation to the prototype
typicality is a function of similarity to the
prototype

Prototype theory:
prototype

Prototype theory:
prototype
What exactly counts as similar?
Cognitive science and similarity:

a love-hate relationship
Similarity is radically unconstrained.
depending on what counts as similar,
anything can be similar to anything else
Yet . . .
empirically, similarity judgments show
strong and systematic regularities
theoretically, similarity plays a crucial role
in many accounts of how the mind works
Similarity is everywhere
Perceptual grouping
Perceptual grouping
Categorization
Perceptual grouping
Categorization
Reasoning
given that lions have protein K in their
blood, what other animals have protein K?
Perceptual grouping
Categorization
Reasoning
Modeling
language
Modeling similarity
Can we come up with a formal account
of what makes two things similar?
Two approaches
spaces (Shepard)
features (Tversky)
Roger Shepard
Developed methods for
identifying mental
structure from behavior
Tried to find universal

laws of cognition
(Rumelhart Prize winner, 2005)
How a cognitive psychologist came to seek

universal laws, Shepard (2004)
[one reason] is my unwillingness to be satisfied with any
proposed psychological principle whose sole justification is that it
fits all the available empirical evidence-whether behavioral or
neurophysiological. I crave, in addition, a reason that that
behavioral principle (or that associated neural structure) should
have the particular form that it does, rather than some other. [I
believe that] that if, as I fervently hope, psychological principles are
not merely arbitrary, some may be shown to have arisen as
accommodations to universal features of the world. If this is so, we
might aspire to a science of the mind that, like the physical and
mathematical sciences, has universal laws.

What level of analysis is this?

What level of analysis is this?
Response time
Mental
rotation
Angle of rotation
(Shepard & Metzler, 1971)
Roger Shepard
Roger Shepard
Shepards strategy
Similarity is a fuzzy notion what is a better
way of thinking about these judgments?
what is the underlying computational problem?
Shepards strategy
Generalization: what is the probability that y

will possess a property, given that x does?
Shepards strategy
Generalization: what is the probability that y

will possess a property, given that x does?
Example 1: Is y an animal if we know x is?
Example 2: Is y poisonous if x is?
Spatial representations
x
consequential
region
brightness
x
y
hue
brightness
x
y
hue
Best way to represent similarity between points?
brightness
x
y
hue
One way to do it is: distance!
Euclidean distance = shortest path
brightness
physical dimension
x
y
hue
Shepard was interested in psychological space,
and NOT physical parameter space!
brightness
physical dimension
x
y
hue
How do we obtain a psychological space?
brightness
physical dimension
x
y
hue
How do we obtain a psychological space?
Shepard discovered how! to be continued
Psychological similarity
Shepards universal law

S (a, b) e
d ( a ,b )
similarity decreases
exponentially with
distance in
psychological space
Distance in psychological space

S (a, b) e
d ( a ,b )
exponentially with
distance in
psychological space

S (a, b) e
d ( a ,b )
exponentially with
distance in
psychological space
WHY?
Explaining the universal law
Single dimension (simple case)

We consider only integer stimulus values
Black dot = stimulus x

Example: hormone level
Y-axis:
Probability that y is in
the consequential
region, given x is.
Possible
hypotheses
(consequential
regions) are
overlapping
intervals of all
possible sizes
Example 1
Interval
includes only
the original
value of 60
Example 2
Interval
includes 58-60
(size of 3)
Smaller
intervals more
probable than
large intervals
(bar thickness)
All probabilities
sum to 1

p(yC|x) = 1
Case 1:
Stimulus y also
has a value of 60
y is inside all
possible intervals
that include x
Probability that y
is in region is 1.

p(yC|x) shrinks
Case 2:
Stimulus y has a
value of 61
y is inside 16 out
of 21 bars
y is inside 6 less
bars than before
Total probability
of being in the
consequential
region shrinks

Case 3:
Stimulus y has a
value of 62
y is inside 5 less
bars than before

Case 4:
Stimulus y has a
value of 63
y is inside 4 less
bars than before

Summary
As y value gets
further away
from x value, the
number of
intervals
containing it
decreases,
BUT, at a
decreasing rate
(lost 6,5,4 bars).

Summary
As y value gets
further away
from x value, the
number of
intervals
containing it
decreases,
BUT, at a
decreasing rate
(lost 6,5,4 bars).
Question: What function decreases at a

decreasing rate?

Summary
As y value gets
further away
from x value, the
number of
intervals
containing it
decreases,
BUT, at a
decreasing rate
(lost 6,5,4 bars).
Question: What function decreases at a

decreasing rate? Answer: f(x) = e-x

S (a, b) e
d ( a ,b )
exponentially with
distance in
psychological space
Break
Up next:
Spaces vs. features

-d (a,b)
S(a,b) = e
exponentially with
distance in
psychological space
Amos Tversky
Famous for his work with
Daniel Kahneman on
heuristics and biases
Pursued an axiomatic
approach to modeling
(Grawemeyer Prize winner, 2003)
Properties of distances
From the metric axioms:
symmetry
d(a,b) = d(b,a)
triangle inequality d(a,c) d(a,b) + d(b,c)
Constraints on neighborhood relations

(in low-dimensional spaces)
Similarity can be asymmetric

Which direction of comparison is preferred?
A camel is like a horse
A horse is like a camel
(Tversky, 1977)
Similarity can be asymmetric

Which direction of comparison is preferred?
A camel is like a horse
A horse is like a camel
(Tversky, 1977)
Violation of triangle inequality

a
d(a,c) d(a,b) + d(b,c)

c
(Tversky, 1977)
Violation of triangle inequality

a
d(a,c) d(a,b) + d(b,c)

c
Can find similarity judgments that violate this:
(Tversky, 1977)
Constraints on neighborhoods
Nearest neighbors
(Tversky & Hutchinson, 1986)
To how many points can a point be the
nearest neighbor?
1 dimension:
nearest neighbor?
1 dimension: 2
nearest neighbor?
2 dimensions:
nearest neighbor?
2 dimensions: 5
nearest neighbor?
These constraints
may not mix well
with similarity
2 dimensions: 5
Evidence: Hierarchical structures require
some concepts to be the nearest neighbors of
many others
fruit
apple
orange
peach
pear
Evidence: Hierarchical structures require
some concepts to be the nearest neighbors of
many others
Fruit should be
more similar to
apple than apple
is to orange
fruit
apple
orange
Problem?
peach
pear
Properties of distances
From the metric axioms:
symmetry
d(a,b) = d(b,a)
triangle inequality d(a,c) d(a,b) + d(b,c)
Constraints on neighborhood relations

(in low-dimensional spaces)
If similarity lacks these properties, it must be

based on something other than distance
Tverskys Approach
Borrows from set theory
Stimuli are sets of features
banana = {yellow, curved, }
Intuition
Two stimuli (two sets of features) can have
overlap
However, distinctive or salient features
also matter
Tverskys Approach
Identifies axioms that must be satisfied:
Matching
Monotonicity
Independence
Solvability
Invariance
Derives the contrast model to satisfy

these
Tverskys contrast model

AB
AB
BA
S(a,b) = q f (A B) - a f (A - B) - b f (B - A)
common
features
distinctive
features of a
A: set of features of a
B: set of features of b
f: function from sets to numbers
: free parameters, all 0
distinctive
features of b

AB
AB
BA
S (a, b) f ( A B) f ( A B) f ( B A)
common
features
distinctive
features of a
distinctive
features of b

AB
AB
BA
S (a, b) f ( A B) f ( A B) f ( B A)
common
features
distinctive
features of a
distinctive
features of b

AB
AB
BA
S (a, b) f ( A B) f ( A B) f ( B A)
common
features
distinctive
features of a
distinctive
features of b

AB
AB
BA
S (a, b) f ( A B) f ( A B) f ( B A)
common
features
distinctive
features of a
distinctive
features of b
Similarity increases with shared features and less

distinctive features (of both A and B)

AB
AB
BA
S (a, b) f ( A B) f ( A B) f ( B A)
common
features
distinctive
features of a
distinctive
features of b
Similarity is a linear combination of the common and

distinctive features.

AB
AB
BA
common
features
distinctive
features of a
weights features by salience
distinctive
features of b

S (a, b) f ( A B) f ( A B) f ( B A)
Case = 0, = 0:
Only shared features matter
Case = 0:
Only distinctive features matter

S (a, b) f ( A B) f ( A B) f ( B A)
Case = 0, = 0:
Case = 0:
3 free parameters = good or bad?

S (a, b) f ( A B) f ( A B) f ( B A)
Case = 0, = 0:
Case = 0:
3 free parameters = good or bad?
Good: specifies a series of models
Bad: might be too underconstrained.
Tversky on asymmetry
Empirical finding: more prominent or
prototypical stimulus (a) is the better
target of similarity
Empirical finding: more prominent or
prototypical stimulus (a) is the better
target of similarity
We say the variant is like the prototype
Assume a = prototype, b = variant
So, we want: S(b,a) > S(a,b)
Empirical finding: more prominent,
salient, prototypical stimulus (a) is
the better target of similarity
S(b,a) - S(a,b) = (a -b )[ f (B - A) - f (A - B)]
S(b,a) > S(a,b) under two conditions:
>
f (AB) > f (BA): f (prototype) > f (variant),
i.e., more features or more highly weighted
features for the prototype than the variant
Support for assumptions?

>
arbitrary assumption
f (prototype) > f (variant)

Tversky presents no direct evidence
some counter-evidence: S(bat, mouse) > S(mouse, bat),
so mouse should be prototype, but people list more
distinctive features for bat (e.g., smelly, nocturnal,
jungle) than mouse (e.g., paws, tail, fields)
a counter-intuition: atypical entities have more unusual
features (e.g., bats are mice that fly)
Summary
Similarity (or generalization) can sometimes
be described as decreasing exponentially
with distance in psychological space
But Tverskys arguments work against a
simple translation from distance to similarity
different kinds of representations for different
kinds of stimuli?
Next week
Discovering mental representations
using models of similarity as the basis for
methods for finding spaces/features

Similarity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Similarity

Uploaded by

Copyright:

Available Formats

CogSci 131

Similarity, spaces, and features

The rules and symbols idea

Part III: Networks, spaces, and features

Spaces vs. features

How can we explain typicality?

How can we explain typicality?

How can we explain typicality?

Cognitive science and similarity:

Tried to find universal

(Rumelhart Prize winner, 2005)

How a cognitive psychologist came to seek

How a cognitive psychologist came to seek

How a cognitive psychologist came to seek

(Shepard & Metzler, 1971)

Generalization: what is the probability that y

Generalization: what is the probability that y

Shepards universal law

Distance in psychological space

Shepards universal law

Distance in psychological space

Shepards universal law

Distance in psychological space

Explaining the universal law

Explaining the universal law

Single dimension (simple case)

Explaining the universal law

Black dot = stimulus x

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Explaining the universal law

Question: What function decreases at a

Explaining the universal law

Question: What function decreases at a

Shepards universal law

Distance in psychological space

Shepards universal law

Distance in psychological space

(Grawemeyer Prize winner, 2003)

Constraints on neighborhood relations

Similarity can be asymmetric

Similarity can be asymmetric

Violation of triangle inequality

d(a,c) d(a,b) + d(b,c)

Violation of triangle inequality

d(a,c) d(a,b) + d(b,c)

Can find similarity judgments that violate this:

(Tversky & Hutchinson, 1986)

(Tversky & Hutchinson, 1986)

(Tversky & Hutchinson, 1986)

(Tversky & Hutchinson, 1986)

(Tversky & Hutchinson, 1986)

(Tversky & Hutchinson, 1986)

Constraints on neighborhood relations

If similarity lacks these properties, it must be

Derives the contrast model to satisfy