You are on page 1of 85

CogSci 131

Similarity, spaces, and features


Joshua Peterson

Admin
Problem Set 0 grades posted
Problem Set 1 due tomorrow

The rules and symbols idea


Thought is a formal system, on the model of
mathematical logic and computation
Languages are sets of sentences, characterized
by formal theories or grammars
(Learning is hard, and guided by strong innate
constraints that are domain-specific)

Part III: Networks, spaces, and features

Outline
Similarity and generalization
Break

Spaces vs. features

How can we explain typicality?


One answer: reject definitions, and have a
new representation for categories
Prototype theory:
categories are represented by a prototype
other members share a family resemblance
relation to the prototype
typicality is a function of similarity to the
prototype

How can we explain typicality?


One answer: reject definitions, and have a
new representation for categories
Prototype theory:
categories are represented by a prototype
other members share a family resemblance
relation to the prototype
typicality is a function of similarity to the
prototype

How can we explain typicality?


One answer: reject definitions, and have a
new representation for categories
Prototype theory:
categories are represented by a prototype
other members share a family resemblance
relation to the prototype
typicality is a function of similarity to the
prototype
What exactly counts as similar?

Cognitive science and similarity:


a love-hate relationship
Similarity is radically unconstrained.
depending on what counts as similar,
anything can be similar to anything else

Yet . . .
empirically, similarity judgments show
strong and systematic regularities
theoretically, similarity plays a crucial role
in many accounts of how the mind works

Similarity is everywhere
Perceptual grouping

Similarity is everywhere
Perceptual grouping
Categorization

Similarity is everywhere
Perceptual grouping
Categorization
Reasoning
given that lions have protein K in their
blood, what other animals have protein K?

Similarity is everywhere

Perceptual grouping
Categorization
Reasoning
Modeling
language

Modeling similarity
Can we come up with a formal account
of what makes two things similar?
Two approaches
spaces (Shepard)
features (Tversky)

Roger Shepard
Developed methods for
identifying mental
structure from behavior

Tried to find universal


laws of cognition

(Rumelhart Prize winner, 2005)

How a cognitive psychologist came to seek


universal laws, Shepard (2004)
[one reason] is my unwillingness to be satisfied with any
proposed psychological principle whose sole justification is that it
fits all the available empirical evidence-whether behavioral or
neurophysiological. I crave, in addition, a reason that that
behavioral principle (or that associated neural structure) should
have the particular form that it does, rather than some other. [I
believe that] that if, as I fervently hope, psychological principles are
not merely arbitrary, some may be shown to have arisen as
accommodations to universal features of the world. If this is so, we
might aspire to a science of the mind that, like the physical and
mathematical sciences, has universal laws.

How a cognitive psychologist came to seek


universal laws, Shepard (2004)
[one reason] is my unwillingness to be satisfied with any
proposed psychological principle whose sole justification is that it
fits all the available empirical evidence-whether behavioral or
neurophysiological. I crave, in addition, a reason that that
behavioral principle (or that associated neural structure) should
have the particular form that it does, rather than some other. [I
believe that] that if, as I fervently hope, psychological principles are
not merely arbitrary, some may be shown to have arisen as
accommodations to universal features of the world. If this is so, we
might aspire to a science of the mind that, like the physical and
mathematical sciences, has universal laws.
What level of analysis is this?

How a cognitive psychologist came to seek


universal laws, Shepard (2004)
[one reason] is my unwillingness to be satisfied with any
proposed psychological principle whose sole justification is that it
fits all the available empirical evidence-whether behavioral or
neurophysiological. I crave, in addition, a reason that that
behavioral principle (or that associated neural structure) should
have the particular form that it does, rather than some other. [I
believe that] that if, as I fervently hope, psychological principles are
not merely arbitrary, some may be shown to have arisen as
accommodations to universal features of the world. If this is so, we
might aspire to a science of the mind that, like the physical and
mathematical sciences, has universal laws.
What level of analysis is this?

Response time

Mental
rotation

Angle of rotation

(Shepard & Metzler, 1971)

Roger Shepard

Roger Shepard

Shepards strategy
Similarity is a fuzzy notion what is a better
way of thinking about these judgments?
what is the underlying computational problem?

Shepards strategy
Similarity is a fuzzy notion what is a better
way of thinking about these judgments?
what is the underlying computational problem?

Generalization: what is the probability that y


will possess a property, given that x does?

Shepards strategy
Similarity is a fuzzy notion what is a better
way of thinking about these judgments?
what is the underlying computational problem?

Generalization: what is the probability that y


will possess a property, given that x does?
Example 1: Is y an animal if we know x is?
Example 2: Is y poisonous if x is?

Spatial representations
x

consequential
region

brightness

Spatial representations

x
y

hue

brightness

Spatial representations

x
y

hue
Best way to represent similarity between points?

brightness

Spatial representations

x
y

hue
One way to do it is: distance!
Euclidean distance = shortest path

Spatial representations

brightness

physical dimension

x
y

hue
Shepard was interested in psychological space,
and NOT physical parameter space!

Spatial representations

brightness

physical dimension

x
y

hue
How do we obtain a psychological space?

Spatial representations

brightness

physical dimension

x
y

hue
How do we obtain a psychological space?
Shepard discovered how! to be continued

Psychological similarity

Shepards universal law


S (a, b) e

d ( a ,b )

similarity decreases
exponentially with
distance in
psychological space

Distance in psychological space

Psychological similarity

Shepards universal law


S (a, b) e

d ( a ,b )

similarity decreases
exponentially with
distance in
psychological space

Distance in psychological space

Psychological similarity

Shepards universal law


S (a, b) e

d ( a ,b )

similarity decreases
exponentially with
distance in
psychological space
WHY?

Distance in psychological space

Explaining the universal law

Explaining the universal law

Single dimension (simple case)


We consider only integer stimulus values

Explaining the universal law

Black dot = stimulus x


Example: hormone level

Explaining the universal law

Y-axis:
Probability that y is in
the consequential
region, given x is.

Explaining the universal law

Possible
hypotheses
(consequential
regions) are
overlapping
intervals of all
possible sizes

Explaining the universal law

Example 1
Interval
includes only
the original
value of 60

Explaining the universal law

Example 2
Interval
includes 58-60
(size of 3)

Explaining the universal law

Smaller
intervals more
probable than
large intervals
(bar thickness)

All probabilities
sum to 1

Explaining the universal law


p(yC|x) = 1

Case 1:
Stimulus y also
has a value of 60
y is inside all
possible intervals
that include x

Probability that y
is in region is 1.

Explaining the universal law


p(yC|x) shrinks

Case 2:
Stimulus y has a
value of 61
y is inside 16 out
of 21 bars
y is inside 6 less
bars than before
Total probability
of being in the
consequential
region shrinks

Explaining the universal law


Case 3:
Stimulus y has a
value of 62
y is inside 5 less
bars than before

Explaining the universal law


Case 4:
Stimulus y has a
value of 63
y is inside 4 less
bars than before

Explaining the universal law


Summary
As y value gets
further away
from x value, the
number of
intervals
containing it
decreases,
BUT, at a
decreasing rate
(lost 6,5,4 bars).

Explaining the universal law


Summary
As y value gets
further away
from x value, the
number of
intervals
containing it
decreases,
BUT, at a
decreasing rate
(lost 6,5,4 bars).

Question: What function decreases at a


decreasing rate?

Explaining the universal law


Summary
As y value gets
further away
from x value, the
number of
intervals
containing it
decreases,
BUT, at a
decreasing rate
(lost 6,5,4 bars).

Question: What function decreases at a


decreasing rate? Answer: f(x) = e-x

Psychological similarity

Shepards universal law


S (a, b) e

d ( a ,b )

similarity decreases
exponentially with
distance in
psychological space

Distance in psychological space

Break

Up next:
Spaces vs. features

Psychological similarity

Shepards universal law


-d (a,b)

S(a,b) = e

similarity decreases
exponentially with
distance in
psychological space

Distance in psychological space

Amos Tversky
Famous for his work with
Daniel Kahneman on
heuristics and biases

Pursued an axiomatic
approach to modeling

(Grawemeyer Prize winner, 2003)

Properties of distances
From the metric axioms:
symmetry
d(a,b) = d(b,a)
triangle inequality d(a,c) d(a,b) + d(b,c)

Constraints on neighborhood relations


(in low-dimensional spaces)

Similarity can be asymmetric


Which direction of comparison is preferred?
A camel is like a horse
A horse is like a camel

(Tversky, 1977)

Similarity can be asymmetric


Which direction of comparison is preferred?
A camel is like a horse
A horse is like a camel

(Tversky, 1977)

Violation of triangle inequality


a

d(a,c) d(a,b) + d(b,c)


c

(Tversky, 1977)

Violation of triangle inequality


a

d(a,c) d(a,b) + d(b,c)


c

Can find similarity judgments that violate this:

(Tversky, 1977)

Constraints on neighborhoods
Nearest neighbors

(Tversky & Hutchinson, 1986)

Constraints on neighborhoods
To how many points can a point be the
nearest neighbor?

1 dimension:

(Tversky & Hutchinson, 1986)

Constraints on neighborhoods
To how many points can a point be the
nearest neighbor?

1 dimension: 2

(Tversky & Hutchinson, 1986)

Constraints on neighborhoods
To how many points can a point be the
nearest neighbor?

2 dimensions:

(Tversky & Hutchinson, 1986)

Constraints on neighborhoods
To how many points can a point be the
nearest neighbor?

2 dimensions: 5

(Tversky & Hutchinson, 1986)

Constraints on neighborhoods
To how many points can a point be the
nearest neighbor?
These constraints
may not mix well
with similarity
2 dimensions: 5

(Tversky & Hutchinson, 1986)

Constraints on neighborhoods
Evidence: Hierarchical structures require
some concepts to be the nearest neighbors of
many others
fruit

apple
orange

peach
pear
(Tversky & Hutchinson, 1986)

Constraints on neighborhoods
Evidence: Hierarchical structures require
some concepts to be the nearest neighbors of
many others
Fruit should be
more similar to
apple than apple
is to orange

fruit

apple
orange

Problem?

peach
pear
(Tversky & Hutchinson, 1986)

Properties of distances
From the metric axioms:
symmetry
d(a,b) = d(b,a)
triangle inequality d(a,c) d(a,b) + d(b,c)

Constraints on neighborhood relations


(in low-dimensional spaces)

If similarity lacks these properties, it must be


based on something other than distance

Tverskys Approach
Borrows from set theory
Stimuli are sets of features
banana = {yellow, curved, }

Intuition
Two stimuli (two sets of features) can have
overlap
However, distinctive or salient features
also matter

Tverskys Approach
Identifies axioms that must be satisfied:
Matching
Monotonicity
Independence
Solvability
Invariance

Derives the contrast model to satisfy


these

Tverskys contrast model


AB

AB

BA

S(a,b) = q f (A B) - a f (A - B) - b f (B - A)
common
features

distinctive
features of a

A: set of features of a
B: set of features of b
f: function from sets to numbers
: free parameters, all 0

distinctive
features of b

Tverskys contrast model


AB

AB

BA

S (a, b) f ( A B) f ( A B) f ( B A)
common
features

distinctive
features of a

A: set of features of a
B: set of features of b
f: function from sets to numbers
: free parameters, all 0

distinctive
features of b

Tverskys contrast model


AB

AB

BA

S (a, b) f ( A B) f ( A B) f ( B A)
common
features

distinctive
features of a

A: set of features of a
B: set of features of b
f: function from sets to numbers
: free parameters, all 0

distinctive
features of b

Tverskys contrast model


AB

AB

BA

S (a, b) f ( A B) f ( A B) f ( B A)
common
features

distinctive
features of a

A: set of features of a
B: set of features of b
f: function from sets to numbers
: free parameters, all 0

distinctive
features of b

Tverskys contrast model


AB

AB

BA

S (a, b) f ( A B) f ( A B) f ( B A)
common
features

distinctive
features of a

distinctive
features of b

Similarity increases with shared features and less


distinctive features (of both A and B)

Tverskys contrast model


AB

AB

BA

S (a, b) f ( A B) f ( A B) f ( B A)
common
features

distinctive
features of a

distinctive
features of b

Similarity is a linear combination of the common and


distinctive features.

: free parameters, all 0

Tverskys contrast model


AB

AB

BA

S(a,b) = q f (A B) - a f (A - B) - b f (B - A)
common
features

distinctive
features of a

A: set of features of a
B: set of features of b
f: function from sets to numbers
weights features by salience

distinctive
features of b

Tverskys contrast model


S (a, b) f ( A B) f ( A B) f ( B A)
: free parameters, all 0
Case = 0, = 0:
Only shared features matter
Case = 0:
Only distinctive features matter

Tverskys contrast model


S (a, b) f ( A B) f ( A B) f ( B A)
: free parameters, all 0
Case = 0, = 0:
Only shared features matter
Case = 0:
Only distinctive features matter
3 free parameters = good or bad?

Tverskys contrast model


S (a, b) f ( A B) f ( A B) f ( B A)
: free parameters, all 0
Case = 0, = 0:
Only shared features matter
Case = 0:
Only distinctive features matter
3 free parameters = good or bad?
Good: specifies a series of models
Bad: might be too underconstrained.

Tversky on asymmetry
S(a,b) = q f (A B) - a f (A - B) - b f (B - A)
Empirical finding: more prominent or
prototypical stimulus (a) is the better
target of similarity

Tversky on asymmetry
S(a,b) = q f (A B) - a f (A - B) - b f (B - A)
Empirical finding: more prominent or
prototypical stimulus (a) is the better
target of similarity
We say the variant is like the prototype
Assume a = prototype, b = variant
So, we want: S(b,a) > S(a,b)

Tversky on asymmetry
S(a,b) = q f (A B) - a f (A - B) - b f (B - A)
Empirical finding: more prominent,
salient, prototypical stimulus (a) is
the better target of similarity
S(b,a) - S(a,b) = (a -b )[ f (B - A) - f (A - B)]
S(b,a) > S(a,b) under two conditions:
>
f (AB) > f (BA): f (prototype) > f (variant),
i.e., more features or more highly weighted
features for the prototype than the variant

Support for assumptions?


>
arbitrary assumption

f (prototype) > f (variant)


Tversky presents no direct evidence
some counter-evidence: S(bat, mouse) > S(mouse, bat),
so mouse should be prototype, but people list more
distinctive features for bat (e.g., smelly, nocturnal,
jungle) than mouse (e.g., paws, tail, fields)
a counter-intuition: atypical entities have more unusual
features (e.g., bats are mice that fly)

Summary
Similarity (or generalization) can sometimes
be described as decreasing exponentially
with distance in psychological space
But Tverskys arguments work against a
simple translation from distance to similarity
different kinds of representations for different
kinds of stimuli?

Next week
Discovering mental representations
using models of similarity as the basis for
methods for finding spaces/features

You might also like