You are on page 1of 1

276

Chapter 19. A Quick Look at Machine Learning

def dissimilarity(clusters):
totDist = 0.0
for c in clusters:
totDist += c.variance()
return totDist
def trykmeans(examples, exampleType, numClusters, numTrials,
verbose = False):
"""Calls kmeans numTrials times and returns the result with the
lowest dissimilarity"""
best = kmeans(examples, exampleType, numClusters, verbose)
minDissimilarity = dissimilarity(best)
for trial in range(1, numTrials):
clusters = kmeans(examples, exampleType, numClusters, verbose)
currDissimilarity = dissimilarity(clusters)
if currDissimilarity < minDissimilarity:
best = clusters
minDissimilarity = currDissimilarity
return best

Figure 19.8 Finding the best k-means clustering

19.6 A Contrived Example


Figure 19.9 contains code that generates, plots, and clusters examples drawn
from two distributions.
The function genDistributions generates a list of n examples with twodimensional feature vectors. The values of the elements of these feature vectors
are drawn from normal distributions.
The function plotSamples plots the feature vectors of a set of examples. It uses
another PyLab plotting feature that we have not yet seen: the function annotate
is used to place text next to points on the plot. The first argument is the text,
the second argument the point with which the text is associated, and the third
argument the location of the text relative to the point with which it is associated.
The function contrivedTest uses genDistributions to create two distributions of
ten examples each with the same standard deviation but different means, plots
the examples using plotSamples, and then clusters them using trykmeans.

You might also like