Professional Documents
Culture Documents
Georg Gerber
Lecture #6, 2/6/02
Lecture Overview
Euclidean distance
Pearson Linear Correlation
Clustering algorithms
What is clustering?
Why cluster?
Example 1: clustering
genes
Example 2: clustering
genes
Find general
trends in the data
e.g., a group of
genes with high
expression in
twist mutants and
not elevated in
Toll mutants
contains many
known neuroectodermal genes
(presumably overexpression of
twist suppresses
ectoderm)
Example 3: clustering
samples
Lecture Overview
Euclidean distance
Pearson Linear Correlation
Clustering algorithms
How do we define
similarity?
(Dis)similarity measures
Euclidean distance
d euc (x, y)
2
(
x
y
)
i i
i 1
deuc=0.5846
deuc=1.1345
deuc=2.6115
These examples of
Euclidean distance
match our intuition of
dissimilarity pretty
well
deuc=1.41
deuc=1.22
Correlation
( x , y)
( x x )(y
i 1
( xi x )
i 1
y)
2
(
y
y
)
i
i 1
1 n
x xi
n i
1 n
y yi
n i
1 (x, y)
dp
2
PLC (cont.)
= 0.0249, so dp =
0.4876
The green curve is the
square of the blue curve
this relationship is not
Missing Values
Lecture Overview
Euclidean distance
Pearson Linear Correlation
Clustering algorithms
Hierarchical
Agglomerative Clustering
Hierarchical Clustering
(cont.)
This produces a
binary tree or
dendrogram
The final cluster is
the root and each
data item is a leaf
The height of the
bars indicate how
close the items
are
Hierarchical Clustering
Demo
Linkage in Hierarchical
Clustering
Average Linkage
Single Linkage
Complete Linkage
Hierarchical Clustering
Issues
Leaf Ordering in HC
Hierarchical clustering
Input
Optimal ordering
Hierarchical clustering
Input
Optimal ordering
K-means Clustering
Q
i 1
| Ci
d (x, )
|
xCi
How many
clusters do you
think there
actually are?
k
Self-Organizing Maps
Self-Organizing Maps
(cont.)
10,10
11,11
Self-Organizing Maps
(cont.)
Self-Organizing Map
Example
We already saw this in the
context of the macrophage
differentiation data
This is a 4 x 3 SOM and the
mean of each cluster is
displayed
SOM Issues
Other Clustering
Algorithms