You are on page 1of 10

Algorithm AS 136: A K-Means Clustering Algorithm

Author(s): J. A. Hartigan and M. A. Wong


Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1 (1979)
, pp. 100-108
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2346830
Accessed: 19-12-2015 14:17 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the
Royal Statistical Society. Series C (Applied Statistics).

http://www.jstor.org

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

APPLIED STATISTICS

100

FIND WIMUM ENTRY

C
C

6o PIvcr = ,xC
KK = 0

DO 70 I = II, s
K = INDEX(I)
IF (ABS(LU(IC, IIl
PIVOT = ABS(W(K,
KK

C
c
C

C
C

.LE. PIVOT) GOTO70


IIM

70 CONTIlUE
IF (IE *EQ. 0) GOTO 10
SWITCHORDER
ISAVE = INDEX(ltK)
= INDEX('II
INDMEX(KiC)
INDEX(II) = ISAVE
PUT IN COUJimSor LU ONE AT A TIME
IR
IF (INTIA IIBASE(II)
IF (II *EQ. MNGOTO90
J = II + 1
ix) 80 I = it M
K = INDEXVI)
= W(E,
LU(E, II)

80 CONTINUE

II

/ LU(ISAVE,

II)

90 CCNTINUE
EKE = IRCW
RETURN
END

AlgorithmAS 136

A K-MeansClustering
Algorithm
By J. A.

HARTIGAN

and M. A.

WONG

New Haven,Connecticut,
Yale University,
U.S.A.
Keywords: K-MEANS CLUSTERING ALGORITHM; TRANSFER ALGORITHM
LANGUAGE

ISO Fortran
DESCRIPTION AND PURPOSE

The K-meansclustering
algorithmis describedin detailby Hartigan(1975). An efficient
versionof thealgorithmis presentedhere.
The aim of the K-meansalgorithmis to divideM pointsin N dimensionsintoK clusters
sum of squaresis minimized.It is not practicalto requirethatthe
so thatthe within-cluster
solutionhas minimalsum of squares againstall partitions,
exceptwhenM, N are smalland
K = 2. We seek instead"local" optima,solutionssuchthatno movementofa pointfromone
sum of squares.
clusterto anotherwill reducethe within-cluster
METHOD

The algorithmrequiresas inputa matrixof M pointsin N dimensionsand a matrixof


K initialclustercentresin N dimensions.The numberof pointsin clusterL is denotedby
NC(L). D(I, L) is theEuclideandistancebetweenpointI and clusterL. The generalprocedure
is to searchfora K-partitionwithlocallyoptimalwithin-cluster
sum of squares by moving
pointsfromone clusterto another.

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

STATISTICAL

ALGORITHMS

101

Step 1. For each pointI (I = 1,2, ..., M), finditsclosestand secondclosestclustercentres,


IC1(I) and IC2(I) respectively.AssignpointI to clusterICI(I).
Step 2. Update theclustercentresto be the averagesof pointscontainedwithinthem.
Step 3. Initially,all clustersbelongto thelive set.
Step 4. This is the optimal-transfer
(OPTRA) stage:
Considereach pointI (I = 1,2, ..., M) in turn. If clusterL (L = 1,2, ..., K) is updatedin the
last quick-transfer
(QTRAN) stage, then it belongs to the live set throughoutthis stage.
Otherwise,
at each step,itis notin thelivesetifithas notbeen updatedin thelastM optimaltransfersteps. Let point I be in clusterLI. If LI is in the live set, do Step 4a; otherwise,
do Step 4b.
Step4a. Computetheminimumofthequantity,R2 = [NC(L) * D(I, L)2]/[NC(L)+ 1],over
all clustersL (L LI, L= 1,2, ..., K). Let L2 be the clusterwiththe smallestR2. If this
value is greaterthanor equal to [NC(L1) * D(I, Ll)2]/[NC(Ll) - 1], no reallocationis necessary
andL2 is thenewIC2(I). (Note thatthevalue [NC(L1) * D(I,LI)2]/[NC(Ll) - 1] is remembered
and willremainthesameforpointI untilclusterLI is updated.) Otherwise,
pointI is allocated
to clusterL2 and LI is thenew IC2(I). Clustercentresare updatedto be themeansof points
assignedto themif reallocationhas takenplace. The two clustersthat are involvedin the
transfer
of pointI at thisparticularstepare now in theliveset.
Step 4b. This stepis the same as Step 4a, exceptthatthe minimumR2 is computedonly
over clustersin the live set.
Step 5. Stop if the live set is empty.Otherwise,go to Step 6 afterone pass throughthe
data set.
Step 6. This is thequick-transfer
(QTRAN) stage:
Considereach pointI (1 = 1,2, ..., M) in turn. Let LI = IC1(I) and L2 = IC2(I). It is not
necessaryto checkthe pointI ifboth theclustersLI and L2 have not changedin the last M
steps. Computethevalues
RI = [NC(L1) * D(I,L1)2]/[NC(L1)- 1] and R2 = [NC(L2) * D(I,L2)2]/[NC(L2)+ 11.
(As noted earlier,RI is remembered
and will remainthe same untilclusterLI is updated.)
If RI is less thanR2, pointI remainsin clusterL1. Otherwise,switchICl (I) and IC2(I) and
updatethecentresof clustersLI and L2. The twoclustersare also notedfortheirinvolvement
in a transfer
at thisstep.
tookplace in thelast M steps,go to Step4. Otherwise,
Step 7. If no transfer
go to Step 6.
STRUCTURE

SUBROUTINE KMNS (A, M, N, C, K, ICI, IC2, NC, AN1, AN2, NCP, D, ITRAN, LIVE,
ITER, WSS, IFAULT)
Formalparameters
A
Real array(M, N)
M
Integer
N
Integer
Real array(K, N)
C
K
IC1
IC2

Integer
Integerarray(M)
Integerarray(M)

NC
AN1
AN2

Integerarray(K)
Real array(K)
Real array(K)

the data matrix


the numberof points
the numberof dimensions
thematrixof initialclustercentres
the matrixof finalclustercentres
thenumberof clusters
the clustereach pointbelongsto
thisarrayis used to remember
theclusterwhich
each pointis mostlikelyto be transferred
to at
each step
output: the numberof pointsin each cluster
workspace:
workspace:
input:
input:
input:
input:
output:
input:
output:
workspace:

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

102

NCP
D
ITRAN
LIVE
ITE R
WSS
IEA ULT

APPLIED

Integerarray(K)
Real array(M)
Integerarray(K)
Integerarray(K)
Integer
Real array(K)
Integer

STATISTICS

workspace:
workspace:
workspace:
workspace:
input: themaximumnumberof iterationsallowed
output: thewithin-cluster
sumofsquaresofeach cluster
output: see Fault Diagnosticsbelow
FAULT DIAGNOSTICS

IFA ULT -0 No fault


IFA ULT 1 At leastone clusteris emptyaftertheinitialassignment.(A bettersetofinitial
clustercentresis called for)
IFA ULT = 2 The allowedmaximumnumberof iterationsis exceeded
IFA ULT = 3 K is less thanor equal to 1 or greaterthanor equal to M
Auxiliaryalgorithms
The followingauxiliaryalgorithmsare called: SUBROUTINE OPTRA (A, M, N, C, K,
IC1, IC2, NC, AN1, AN2, NCP, D, ITRAN, LIVE, INDEX) and SUBROUTINE QTRAN
(A, M, N, C, K, IC1, IC2, NC, AN1, AN2, NCP, D, ITRAN, INDEX) whichare included.
RELATED ALGORITHMS

is AS 113(A transfer
fornon-hierarchial
A relatedalgorithm
classification)
given
algorithm
uses swopsas wellas transfers
to tryto overcome
byBanfieldand Bassill(1977). Thisalgorithm
theproblemof local optima;thatis, forall pairsof points,a testis made whetherexchanging
more
theclustersto whichthepointsbelongwillimprovethecriterion.It willbe substantially
expensivethanthepresentalgorithmforlargeM.
AS 58 (Euclideanclusteranalysis)givenby
The presentalgorithmis similarto Algorithm
aim at finding
a K-partition
of thesample,withwithin-cluster
Sparks(1973). Bothalgorithms
sum of squares whichcannot be reducedby movingpointsfromone clusterto the other.
of AlgorithmAS 58 does not satisfythis condition. At the
However,the implementation
stage whereeach point is examinedin turnto see if it should be reassignedto a different
cluster,onlythe closestcentreis used to checkforpossiblereallocationof the givenpoint;
a clustercentreother than the closest one may have the smallestvalue of the quantity
+ 1)} dI2,wheren,is thenumberof pointsin cluster/and di is thedistancefromclusterI
{nl/(n1
to the givenpoint. Hence, in general,AlgorithmAS 58 does not providea locally optimal
solution.
are testedon variousgenerateddata sets. The timeconsumedon the
The two algorithms
sum of squares of the resultingK-partitions
IBM 370/158and thewithin-cluster
are givenin
Table 1. While comparingthe entriesof the table, note that AS 58 does not give locally
forthe
optimalsolutionsand so shouldbe expectedto take less time. The WSS are different
two algorithmsbecause theyarriveat different
partitionsof the sets of points. A savingof
about 50 per centin timeoccursin KMNS due to using"live" setsand due to usinga quickiterationsby a factorof 4. Thus,
transfer
stagewhichreducesthenumberof optimaltransfer
KMNS comparedto AS 58 is locallyoptimaland takesless time,especiallywhenthenumber
of clustersis large.
TIME AND AccupAcY
The timeis approximately
equal to CMNKI whereI is the numberof iterations.For an
data structures
IBM 370/158,C = 21 x 10-5sec. However,different
requirequite different
numbersof iterations;and a carefulselectionof initialclustercentreswill also lead to a
considerablesavingin time.
Storagerequirement:
M(N+ 3) + K(N+ 7).

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

STATISTICAL

ALGORITHMS

103

TABLE 1

Time(sec)

WSS

1. M = 1000, N = 10, K = 10

AS 58

63-86

7056-71

2. M = 1000,N = 10, K = 10

AS 58
KMNS

4349
19 11

7779*70
7822-01

3. M = 1000, N = 10, K = 50

AS 58

135-71

4543-82

4. M = 1000,N = 10,K = 50

AS 58
KMNS

95-51

5131P04

5. M = 50, N = 2, K = 8

AS 58

0-17

21-03

(random
spherical
normal)

KMNS

(twowidelyseparatedrandomnormals)

(random
spherical
normal)

KMNS

(twowidely
separated
random
normals)
random
(twowidely
separated
normals)

KMNS

36-66

76-00
57-96

7065-59

456148
5096-23

0-18

21V03

Missingvariatevalues cannotbe handledby thisalgorithm.


The algorithmproducesa clustering
whichis onlylocallyoptimal;thewithin-cluster
sum
of squares may not be decreasedby transferring
a point fromone clusterto another,but
different
partitionsmayhave the same or smallerwithinclustersum of squares.
The numberof iterationsrequiredto attainlocal optimalityis usuallyless than 10.
ADDITIONALCOMMENTS

One way of obtainingthe initial clustercentresis suggestedhere. The points are


firstordered by their distancesto the overall mean of the sample. Then, for cluster
L (L = 1,2, ..., K), the {1 + (L -1) * [M/K]}thpointis chosen to be its initialclustercentre.
In effect,
someK samplepointsarechosenas theinitialclustercentres.Usingthisinitialization
process,it is guaranteedthat no clusterwill be emptyafterthe initialassignmentin the
whichis dependenton theinputorderof thepoints,takes
subroutine.A quick initialization,
the firstK pointsas the initialcentres.
ACKNOWLEDGEMENTS

This researchis supportedby National ScienceFoundationGrantMCS75-08374.


REFERENCES

AS113. A transfer
C. F. and BASSILL,L. C. (1977). Algorithm
algorithm
fornon-hierarchical
classification.
Appl.Statist.,26, 206-210.
New York: Wiley.
HARTIGAN,J.A. (1975). Clustering
Algorithms.
AS 58. Euclideanclusteranalysis.Appl.Statist.,22, 126-130.
SPARKS,D. N. (1973). Algorithm

BANFIELD,

*
C

SUBROUTINEKIRTS(A, M, N, C, K, ICI, ICZ, NC, ANI, AN2, NCP,


D, ITRAN, LIVE, ITER, WSS, IFAULT)

ALGORITHMAS 136 APPL. STATIST. (1979) VOL.28, NO.1

C
C

DIVIDE M POINTS IN N-DIMENSIONALSPACE INTO K CWSTERS


SO THIATTHE WITHIN CUJSTERSUM OF SQUARESIS MINIMIZED.

C
C
C

DIMENSION A(M, N, ICI(Ml, IC2(M), D(Ml


DIMENSIONC(K, N), NC(K), AN1(K), AN2(K), NCP(K)
DIMENSION ITRAN(K), LIVE(K), WSStK), DT(2)
DEFINE BIG TO BE A VERY LARGEPOSITIVE NUMBER
DATABIG /1,0E1O/

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

104

APPLIED
IFAULT = 3

C
C
C
C

IF

(I

LE.

OR, K

FOR EACH POITr I,


AND IC2(I).
IC1(I)

STATISTICS

GE, M) RETURN
FIND ITS TWO CLOSEST
ASSIGN IT TO IC1(I).

CENTRES,

DO 50 I = 1, M
= 1
ICl(I)
= 2
IC2(I)
DO 10 IL-- 1, 2
= 0.0
DT(IL)
DO 10 J = 1, N;
DA = A(I,
J) - C(IL,
J)
= DT(IL)
DT(IL)
+ DA * DA
10 COtNTINUE
IF (I)T(1)
.
T(2!)) GOTO 2o
.
= 2
ICI(I)
=
1
IC2(I)
TEMP = DT(1
DT(1)
DT(2)
= TEMP
DT()
20 DO 50 L = 3, K
DB = 0.0

DO 30 J = 1, NJ

C
C
C
C

C
C

c
C
C
c
C
C
C
C

DC = A(I,
J) - C(L, J)
DB = DB + DC * DC
IF (DO .GE. DT(2))
GOTO 50
30 CONTINUE
IF (DB *LT. DT(1)*
GCTO 40
= D
DT(2)
IC2(I
= L
GOTO 50
= DT(1l
40 DT(2)
= ICI(I)
IC2(I)
=
DO
DT(l)
= L
IC1(I)
50 C0NTIN4UE
UPDATE CUISTER CENTRES TO BE TIIE AVERAGE
(OF POINTS CONTAINED WITHIN THEM
DO 70 L
1, K
NC(L) = 0
= 1, tN
DO 6o
6o C(L, J) = 0.0
70 CONTINUE
DO 90 I = 1, M
L = IC(I)
TIC(L) = NC(L) + i
D)O So 3 = 1, 1N
80 C(L, J) = C(L, J) + A(I,
90 CONTINTUE

3)

CHIECK TO SEE IF THIERE IS

ANY EMPTY CLUSTER AT THIS

STAGE

IFAULT = I
K
DO 100 L =1
IF (NC(L
*EQ. 01 RETURN
100 CONTINUE
IFAULT = 0
DO 120 L = 1, IC
AA = NC(L)
1, N
DO 110 J
110 C(L, J) - C(L, J) / AA
INITIALIZE
ANI(L)
IS
AN2(L) IS

ANI, AN2, ITRiAN AND NCP


- 1)
EQUAL TO NCML) / (NC(L)
EQUAL TO NC'L) / (NC(Ll + 1)
IF CLUSTER L IS UPDATED IN THE QUICiK-TRtANSFER STAGE
ITRAN(L)=l
ITRAlTN'L)=0 OTHJERWISE
IN THE1DOPTIMALITRANSFER STAGE, NCP(L) INDICATES THE STEP AT
WIiICHI CLUSTER L IS LAST UPDAITED

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

STATISTICAL
C
C
C

C
C
C
C
C
C

= AA / CM + 1.0)
AN2(L
AN1(L) = BIG
AN1(L) =
IF (AA GT. 1.0)
ITRAN(L)
I1
NCP(LL = -1
120 CONTINUE
INDEX = 0
DO 140 IJ _ 1, ITER

C
C
C
C
C
C
C
C

/ (AA - 1.0)

CALL OPTRA(A, M, N, C, K, ICI,


D, ITRAN, LIVE, INDEX)

IF (INDEX

IC2,

NC, ANI,

ANZ, NCP,

) GOtO 150

EAXCHPOINT IS TESTED IN TURN TO SEE IF IT SHOULD BE


REALDOCATED TO THE CLUSTER WIIICH IT IS MOST LIKELY TO
BE TRANSFERRED TO (IC2(I))
FROM ITS PRESENT CLUSTER (ICI(I)).
LOOP THIROUGHITHE DATA UNTIL NO FURTHER CHANGE IS TO TAKE PLACE

CALL QTRAN(A, Nl, N, C, K, ICi,


NCP, D, ITRAN, INDEX)

C
C
C

STOP IF NO] TRANSFER TOOK PLACE IN THE LAST M


OPTIMAL-TRANSFER STEPS

C
C
C
C
C
C

C
C

105

IN TIHIS STA\GE, THERE IS ONLY ONE PASS THROUGH THI DATA.


EACIH POINT IS REALLOCATED, IF NECESSARY, TO TIIE CLUSTER
TIHAT WILLi INDUCE THE MAXIMUMREDUCTION IN WITIIIN-CLUSTER
SUM OF SQUARES

C
C
C
C

ALGORITHMS

IN THE QUICK-TRANSFER STAGE, NCP(L) IS EQUAL TO THE STEP AT


WHICH CLUSTER L IS LAST UPDATED PLUS M

IC2,

NC, ANI,

AN2,

IF THERE ARE ONLY TWO CLUSTERS,


NO NEED TO RE-ENTER OPTIMAL-TRANSFER STAGE
(KC EQ. 2)

IF

GOT 150

NCP HIAS TO BE SET TO 0 BEFORE ENTERING OPTRA

DO 130 L 1, K
130 NCP W = 0
140 CONTIINUE
SINCE TlE SPECIFIED
NUMBER OF ITERATIONS
IFAULT IS SET TO BE EQUAL TO 2.
MAY INDICATE UNFORESEEN LOPING
TIIS

IS EXCEEDED

IFAULT = 2
CUMPUTE WITHIIN CLUSTER SUM OF SQUARES FOR EACH CLUSTER

150 Do 160 L

K
1-,
WSS(L) S 0.0
jX i6o J = 1, N
0.0
C(L, J)
16o CoNTINUE
DO 170 I
1,
It

II = ICi(I)
DO 170 J = 1, N
C(II,

J) = C(II,

170 CONTINUE

J) + ACI,

J)

DO 10 J
1, N
1, K
DOt 130 L
180 CML, J) = C(L, J) / FLOAT(NC(L))
DO 190 I = 1, M
II = ICI(I)
DA = A(I,
J) - CCII# J)
= WSS(II)
WSS(II)
+ DA * DA
190 COTlINUE
RETURN
END

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

106

APPLIED STATISTICS
C
C
C
C
C
C
C
C
C

SUBROUTINE OPTRA(A, M, N, C, K, ICI,


* AN2, NCP, D, ITRAN, LIVE, INDEX)
AIM(JRITHM AS 136.1
THIS

DEFINE

BIG TO BE A VERY IARGE POSITIVE

NCP(K)

NUMBER

IF CLUSTER L IS UPDATED IN THE LAST QUICK-TRANSFER STAGE,


IT BELONGS TO THE LIVE SET TIHROGHOUT THIS STAGE.
OTIHE-RVISE, AT EACH STEP, IT IS NOT IN THE LIVE SET IF IT
HItS NUT BEEN UPDATED IN THE LAST M OPTIMAL-TRANSFER STEPS
DO 10 L = 1, K
IF (ITRAN(L)
EQ. 1)
10 CONTINIJE
DO 100 I =1,
m
INDEX = INDEX, + 1
Li = 1C1(I)
I2 = IC2(I)
LL =
IF POINT
IF

(NC(LI)

LIVE(L)

- M + 1-

I IS THE ONLY MEMBER OF CWSTER IA, NO TRANSFER


*EQ.

IN GTO

90

IF L IHAS NUT YET BEEN UPDATED IN THIS


NO NE-ED TO RECOMPUTE D(I)

STAGE

IF (NCP(LI) .EQ. 0) GOTO30


DE =. 0.0
DO 20 J = 1, N
J
DF = A(I,
J - C(I=,
DE = DE + DF * DF
20 CONfTINUE
D(I) = DE * AN1(L1,
FIND TIM CLUSTERWITH MINIMUMf2
30 BJA= 0.0
DO 40 J

1, N

DB = A(I,
J. - C(IR,
DA = DA + M * D

NO.1

DATA BIG /1.OE1O/

C
C

VOL.28,

IS THE OPTIMAL-TRANSFER STAGE

(1979)

DIMENSION A(M, N), IC1(M),


IC2WMv, D(M)
DIMENSION C(K, N), NC(KW, AN1(K), AN2(K),
DIMENSION ITRAN(K),
LIVE(K)

C
C
C
C
C
C

STATIST,

NC, ANI,

EACII POINT IS REALWCATED, IF NECESSARY, TO THE


CLJSTER THlAT WILL INDUCE A MAXIMUMREDUCTION IN
TH}I1 WITHIN-CLUSTER SUM OF SQUlARES

C
C
C
C
C
C
C
C
C

APPL.

IC2,

J)

40 CONTINUE
R2 = DA * AN2()
Do 60 L
Is K

C
C
C
C
C
C

IF I IS GsREATERTHAN OR EQUAL TO LIVE.Li),


THEN tl IS
NUT IN TIHE LIVE SET.
IF THIS IS TRUE, WE ONLY NEED TO
CONSIDER CUJSTERS TIAT ARE IN TIE LIVE SET FOR POSSIBLE
TRANSFER OF POINT I.
UTHERWISE, WE NEED TO CONSIDER
ALL POSSIBLE CLUSTERS

IF

(I

.GE,

LIV,E(L)

AND.

I .GE.

LIVE(L)

i OR. L ,EQ. LL) GUTO60


L .EQ.
RR = NT / AN2(L)

OR.

DC = 0.0

DO 50 J = 1, NJ
DD = A(I,

J) - C'L,

DC = DC + DD * DD
n
IF (DC .GE. RR) GA

3)

6o

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

107

STATISTICAL ALGORITHMS
50 CONTINUE3
R2 = DC * AN2(L)
L2 = L

6o CoNTINUE3
IF

C
C
C

C
C
C
C

(R2

.LT.

DIV)

G(FO 70

IF NO TRAiNSFER IS

NECESSARY,

12 IS THE NE?t IC2(I)

= J2
IC2(I)
GTrO go
UPDATE CLJSTER CENTRES,
FOR CLUSTERS L AND 12,

LIVE, NCP, AN1 AND AN2


AND IC2(I)
AND) UPDAT13 IC1(I)

70 INDIX = 0

= M+ I
LIVE(L1)
= M + I
LIVEL'T)
= I
NCP(L1)
= I
NCP(LE)
ALl = NC(1)
ALY = ALA - 1. 0
ALE2 = C S(2)
ALT = ALTE?+ 1,0

1, N

DO 8o0J

C(L1,
C(L2,

J) = (C(L,r
J) - (C(2,

80 CONTINUE)

J) * ALI - A(I,
J) * AL9E + A(I,

1
NC(L1) = NCC(L1I
= NC(1 2) + 1
NC(L2)
= ALW / ALL
AN2 (LI)
AN1(LIL = BIG
= A1T
AN(L1
IF (ALW GT. 1.0)
= ALT / AU
AN1(L2)
= ALT / (ALT + 1,0)
AN2(L2)
= 2
IC1(I)
= LI
IC2(I)
90 COnUE=
IF (INDEX .EQ, MN RETURN
100 CONITINUE
1, 1
DOI 110 L
C
C
C
C
C

0
ITRAN (L)
- LIVE(L)
LIVE(L!
110 C(ONTINUE
RETURIN
END

C
C
C
C
C
C
C
C
C
C

C
C
C

(ALW

ALW
ALT

1.0)

ITRAN(Ll IS SET TO ZERO BEFORE ENTERING, QTRAXL


liAS TO BE DECREASED BY M BEFORE
ALSO, LIVE(L)
RE-ENTERING OPTRAt

J))
J))

- M

SUBROlTINE QTRAN(A, M, N, C, K,
ANE, NCP, D, ITRAN, INDEX)

ALORITIJULAS 136.?.

IC1,

IC2,

NC, AN1,

APPL. STATIST. (17q)

VOL.28, NO.1

TIIIS IS TIIE QUICK TRANiSFER STAGE.


IS TIHE CLJSTER WHIICII POINT I BELONGS TO.
IC1(IN
IS TE CLUJSTER WHIICHIP0INT I IS MOST
IC2(I)
LIKEL.Y TO BE TR.ANSFERRED TO.
AND IC2(I*
ARE SWITCHED, IFs
FOR EACH POINT I, IC1(I)
NECESSARY, TO REDUCE WITHIN CLUSTER SUM OF SQUARES.
THE CLJUSTER CENTRES ARE UPDATED AFTE1R EACH STEP
DIMENSION
DIMENSIOi
DEFINE
DATA BIG

IC2(M),
A(V, N), IC1(M),
C(KC, N1), NC(KC), AN1(1),

D(M)
AN2(K),

BIG TO BE A VERY LRGlE POSITIVE

NCP(K),

ITRAN(K)

NUMIIER

/1,OE1O/

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

APPLIED STATISTICS

108

IN THE )OPTIMAL-TRANSFERSTAGE, NCP(L) INDICATES THE


STEP AT WHICHICUJSTER L IS LAST UPDATED
IN TIHE QUICIC-TRANSFER STAGE, NCP(LN IS EQUAL TO THE
STEP AT WHICII CLUSTER L IS LAST UPDATED PLUS M

C
C
C
C
C

ICOUN
ISTEP

=
=

0
0

10 DO 70 I = 1, M
ICWUN ICOlUN+ 1
ISTEP + 1
ISTEP
LI = IC1(I)
L2 = ICZ(I)

IF POINT I IS THE ONLY MEMBEROF CUJSTER Li,

C
C

*EQ. 1) GOTO 6

IF (NC(.L1)
C

C
C
C
C
C

NO TRANSFER

IF IST1EP IS GREATER THIANNCP(LA), NO NEED TO RECOMPUTE


DISTANCE FROM4POINT I TO CLUSTER LI
NOTE THAT IF CLUSTER LI IS LAST UPDATED EXACTLY MASTEPS
AGO WE STILL NEED TO COMPUTE THE, DISTANCE FROM POINT I
TO CLUSTER LI

IF (ISTEP
0.0
DO 20 J =
DB = A(I,
DA = DA +
20 CONTIlUE
D(I) = DA
DA =

C
C
C

.GT. NCP(L1))
1, NJ
J) - C(LX,
DO * DO
*

GOTO 30

J)

ANl(Li)

IF ISTEP IS GREATER THAN OR EQUAL TO BOTl NCP(Li) AND


NCP(12) THERE WILL BE NO TRANSFER OF POINT I AT THIS STEP

30 IF (ISTEP
R2 = D(I)

.AND. ISTEP

.GE. NCP(i)
/ AN2(L2)

GE. NCP(L2))

GOTo

6o

DD = 0.0

DO 40 J = 1, N
DE = A(I, J) - C(L2,
DD =

DO + DE

* DE

IF (DD
GE. R12) GOTr
40 COtNTINUE
C

J)

Go

UPD}ATECLJUSTERCENTRES, NCP, NC, ITRAN, ANI AND AN2


AND IC2(I).
FOR CLlJSTERS LA AND 12.
ALSO, UPDATE IC1(I)
NOTE THIAT IF Al4Y UPDATING OCCURS IN THIS STAGE,
INDEX IS SET BACK TO 0

C
C
C
C
C

0
0
INDEX
ITRAN(Li) = 1

ICOUN

= 1

ITRAWT(2)

NCP(L1) = ISTEP + M
NCP(L2) = ISTEP + M
ALi = N4C
(LI)
ALW = ALL - 1.0
AI2 = NC(U.)
ALT = AL2 + 1. 0

DO 507 = 1, N
J) * ALI - A'I, J.) / ALU
C(LI, J) = (C(Li,
(C(12, J) * ALS + A(I, J)) / ALT
C(L2, J)
50 CONTINIUE
- 1
=NC(Li
NC(LI)
NC(L2) = NC(12) + 1
=
ALU( / ALi
AN2(L1)
AN1(LU) = BIG
= ALW / (ALW - 1.0)
IF (ATJ .GT. 1.0) AN1()
AN1(L-2) = ALT / ALI
AN2(12) = ALT / t(ALT + 1.0)
-= 2
ICiCI)
IC2(I)
C

I,

IF NO REALUICATION TOOK PLACE IN TliE LAST M STEPS,

C
C

6o

IF (ICOUN .EQ.
70 CONTINUE
GOYTOl
10
IND

RERN

M) RETURN

This content downloaded from 143.107.12.71 on Sat, 19 Dec 2015 14:17:36 UTC
All use subject to JSTOR Terms and Conditions

You might also like