Professional Documents
Culture Documents
48
CH2915-7/90/0000/0048$01.OO Q 1990 IEEE
Authorized licensed use limited to: Chameli Devi Inst of Tech and Management. Downloaded on July 31,2010 at 06:24:18 UTC from IEEE Xplore. Restrictions apply.
processor. This CAM based processor, referred to as achieved by having aclive classifiers pay a portion of
‘The Coherent Processor (CP), has an architecture their strength to classifiers active in the previous cycle
which replaces the traditional random-access main
memory with CAM. The CP is designed to plug into DOOI,B, a simplified version of the standard classifier
personal computers or engineeringlscientific work- systems, was first proposed by Wilson [ 1 I] in his in-
stations and to function as a hardware accelerator. itial studies of machine learning paradigms for auton-
By using this associative processor it is now possible omous robots or ANIMATS. nOO1,E essential11
to verify that the response time of a classifier system eliminates the requirement to use the bucket brigadr
which uses CAM will be invariant with respect to the algorithm to apportion payofl from the environment,
number of rules in the system. yet retains all of the aspects of classifier systems which
are essential to machine learning. ROOLE is a systern
which is meant to learn disjunctive Roolean functions
which have bcen shown to be diffcult machinc
learning problems. The function which Wilson and
3.0 Classifier Systems and BOOLE others have studied is the multiplexer function wherc
there are k address bits (ak)and 2k data bits (dk). Thr
Classifier Systems, introduced by IIoIland[4], are system is to learn to use the address bits to select thc
adaptive machine learning systems which have a correct data bit as a response lo an addressldata vec-
rule-based message-passing structure. Classifier Sys- tor from the input interface. For a 6-multiplexet
tems learn through thc application of genetic algo- there are 2 address bits and 4 data bits. One iteration
rithmsC3) to introduce rules into the system and via of the B001,E system entails the random generation
credit apportionment to distribute rcward from the of a binary string of length 6, which is presented to
environment. Classifier Systems are meant to operate the systetn as an input message. ROOIX is then to
in environments which are changing, noisy and give respond with the correct respoiisc; for a correct re-
reward infrequently and usually only after long se- sponses the system receives a rcward whereas an in-
quences of actions. The system often has goals which correct response is penalized. Only the classifiers
are implicitly and/or inexactly dcfincd and has rcal- which matched the input message and have an action
time demands for actions [I]. equal to the response are rewarded or penalizcd.
A classifier system consists of the following major
components: 1) Messages and Classifiers, 2) An Rn-
vironment Interface, 3 ) Rule Discovery, and 4) Credit
4.0 The Coherent Processor
Apportionment. Messages are simple bit strings that
are used to either represent information from or
The Coherent Processor (CP) [6] is the target
commands to the environment intt:rfaa:. Classifiers
processor for which the Associative Classifier System
are simple IF-THEN rules with multiple conditions was written. The C P is a product of the Syracusc.
and an action. Conditions are message pattcrns University Machine for Associative Computinp
which are matched against input messages, while (SUMAC) project ( see [2], 1:7] or [IO]). ‘The main
actions are message patterns for the generation of thrust of this project is to develop an associative ar-
output messages. Each classifier has an associated chitccture which is suitable for the high speed exe-
strength which quantifies the frequency with which a cution of logic programs. As such, many aspects of
classifier directly or indirectly led to reward from the the CPs architecture are deiived from the require-
environment. The environment interface consists of ments of executing logic programs. IIowever the ar-
detectors and effectors. Detectors ericode information chitecture which has resulted is quite general in naturc.
from the environment into messages and effectors and provides the capability to implement many algo-
modify the environment based on messages. Rule rithms. Its development has just rccently been com-
discovery is used to evolve new classifiers to add to pleted and a prototype is in our lab under evaluation
the system. The most commonly used heuristics for for AI applications. The CP is a Microchannel board
rule discovery are Genetic Algorithms. Credit ap- which has been installed in an IDM PS/2 model 70
portionment is the means by which the classifier and operates as a memory mapped 1/0 device. Thr
strengths are updated. The approach which has been board contains 4Kx36 words of Content Addressablr
developed is called the bucket brigade algorithm and Memory (CAM). The amount of CAM is expanda-
is used to distribute reward, as strength, to the chain ble by adding additional CA \4 expansion boards to
of classifiers which led to a corrrct decision. This is the system. In addition to tb:: CAM there is a 64K
49
Authorized licensed use limited to: Chameli Devi Inst of Tech and Management. Downloaded on July 31,2010 at 06:24:18 UTC from IEEE Xplore. Restrictions apply.
byte writable control store (WCS) on the C P which
controls the operation of the CAM. The WCS is
typically loaded at the start of an application with the
microcode to perform the required read, write, search,
and logical CAM operations. ‘The microcode rou-
tines are called as co-processor functions from the
application program.
Figure 1 . Format o f an ACS Quad CAM
The heart of the CP is a specially designed V I S 1 chip Word: Quad CAM word is constructed
which contains the CAM array and support logic. from a n even/odd pair o f CAM words.
The support logic processes the match lines from the
CAM array and consists of 5 one bit registers, a A single quad word in the ACS is structured as shown
boolean logic unit and a Multiplc Response Resolver in Figure 1 . This allows, with the use of the CP’s
(MRR) per CAM word. The M R R , fed by the masking capability, for fully associative searchcs to
MRReg, guarantees activation of only one CAM row be performed on either the condition or the strength
select at a time by passing only the top most active field of the classifiers. noth of these searches are re-
bit of the MRReg. A microcode operation, quired for the algorithms used in ACS.
SelectNext, causes the M R R to reset the topmost
active bit in the MRReg. This allows the MRReg to
be used as a pointer into the CAM array which can 5.2 Classifier System
be moved from the top most to thc bottom most ac-
tive word by performing Selcct Nexts. During each iteration the claasifier system must per-
form three major functions: 1’) message matching, 2)
response selection and 3 ) rule strength adjustment.
50
Authorized licensed use limited to: Chameli Devi Inst of Tech and Management. Downloaded on July 31,2010 at 06:24:18 UTC from IEEE Xplore. Restrictions apply.
CAM, one classifier is selected to post its action as the window, 2) there is one strength field in window or.
output message. 3 ) multiplc strength fields Lie within the window. For
case 1, a new window is generated. Case 2 selects thc
Based on the response from the cnvironment, which classifier whose associated strength field is in thc
in this case is just an evaluation of thc multiplexer window. Case 3 requires that one of the classificrs i n
function to test the correctness of the posted output, the window be selected; this i y done on a purcly ran-
the strength of the rules are updated via the distrib- dom basis by counting the number of matches and
ution of reward or penalty. Once the strengths have picking a random numbcr, RN, between 1 and thc
been updated the rules are written back to the CAM number of matches. The classifier is selected by per-
array, into the same locations from which they were forming R N SelcctNcxt operations on the MRReg
read. A copy of the match vector from the message until the MRR is pointing to the selected CAM word
matching step is maintaincd in one of the response Note that this introduces 2 steps of serial processing
registers for this purpose. The MRReg is loaded with into the sclcction procedure: counting matches and
the match vector and the MRR is used to sclect the performing the SelectNext opcrations.
individual CAM locations to write.
The window size parameter is very critical to the
performance of ACS. both in terms of machinc
5.3 Genetic Algorithms learning ability and response time. If the window i.;
to large, then response time is excessive because thcrc
In the BOOLE system the genctic algorithms are ini- will be a large number of classifiers inside the window
tiated once for every iteration, and gencrate a single which will have to be processed serially. Also, siner
rulc; this constant but slow evolving of rules is to hclp the serial selection procedure used by ACS is com-
kecp the system stable and reliable, yet adaptivc. For pletely random, the effectiveticss of the strcngths in
ROO1 ,E, the selcction of parents and replacements is influencing parcnt selection would be reduced by too
the component of the genctic algorithms which rc- large a window. This has the effect of limiting thr
quires the most computational effort. Selection of systems ability to generate better classifiers, from
parents and replacements is based on thc strengths of known good classifiery, and thus slows the rate at
the classifiers in the CAM. which learning occurs. Going to the opposite ex-
trcmc, a very sinall window will cause the program to
A selection algorithm has bcen dcveloped which gcncrate many windows bcfore finding a classificr
avoids the necessity of reading thc entire contents of This will also increase response timc.
the CAM into host-processor memory to perform a
weighted strength selection. 'The idea is to use a Becausc the optimal valuc cif the window size pa
windowing mechanism, based on classificr strengths, rametcr is directly reliiled to thc numher of clascifierk
to reduce the number of classificrs from which to sc- in the system a second sclection proccdurc has re-
lect. Once the number of classificrs has heen signif- cently bccn implcmentetl. Thic approach, the ncaresl
icantly decreascd, more serial approachcs can be uscd selection (NS) algorithm, is the same as window se-
to make the selection. First thc maximum and mini- lcction up to and including the gcneration of a win-
mum values of the strength ficlds in the CAM are dow ccntcr o r chosen point. This approach fincls thr
determined. These searches can each be pcrformcd smallest strcngl h valuc grcatcr then the chosen poinr
in O(n) CAM operations [SI, wherc 11 is the Icngth for parent sclcction and the grixtcst strength valuc l e y \
of the strength field. Then a random point bctwccn than the chosen point for rcplaccment selection. Thi\
the minimum and maximum strengths is selected. is done by p f o r m i n g a series of searches starting at
This serves as the center for a window, the width of the chosen point and moving to an extrema. Thc idea
which is a percentage of the differcnce between the is to use the search point and the extrema as the initial
maximum and minimum strengths. 'Ihe width of the bounds on a search area. After every search thr
window, referred to as window size, is a system pa- bounds are adjusted inward until: 1) bounds arc
rameter and dircctly affects the opcrntion of the ACS. equal, 2) only one classifier lies betwcen them, or 3 )
A between limit search of the CAM is then performed no classificrs were bclwcen initial bouncls. Case I
to locate all strength fields which arc inside thc win- occurs when multiple strengths with thc samc valur
dow. This search can be pcrformcd in O(2 log(n-1)) havc been found, in which case the serial sclcction
CAM operations [SI, wherc n is the width of thc technique of above is used. Case 2 selects thc
search window. Several conditions can now have classifier which was locatcd. Case 3 causes a retry of
occurred 1) there are no strength fields inside the the selcct ion proced urc.
51
Authorized licensed use limited to: Chameli Devi Inst of Tech and Management. Downloaded on July 31,2010 at 06:24:18 UTC from IEEE Xplore. Restrictions apply.
lOa
6.0 Results
v)
e,
C
v)
80
ACS results: 1) verify that ACS Icarning ability is e,
v)
To measure speedups Ilic set of C routines which agc of 349 commands sent to the CI’ per iteration, foi.
interface to the CI’ was rrwrittc-n to work with an ar- an overhead of 10.4 milliseconds. As thc number 01’
ray of classifiers in host proceswr mcmory T‘he classifiers increases the numbcr of commands sent tci
modified C routines are not just n direct mapping of thc CI’ increases, particularly the 6-multiplexer prob-
the CAM functions onto a memory array, but wcrc Icm, and hence the impact of the ovcrhcad becomes
optimized to use efficient algorithms I:or examplc, more sevcre. The reason the 6-multiplexcr problcm
the routine which selects parents was restructured to is likcly to suffcr greater ovcrhcad is explained below.
use a typical selection algorithm which rcquires that ‘There are versions of the (:I‘ under development
the strength fields be examined sequentially until a which don’t require this switch. ‘The other rcason is
selection is made. A non-CAM iinplerncntation o f a result of the DOOI,E system. There isn’t enougli
the window selection tcchniquc described abovc pattern matching in DOOI,E to takc advantage of t h k ,
would have run slower thcn the scgucntial cxamina- capability of t.he CAM. Ilowevcr, a full scalc
tion approach. classifier systcm which has a message list will have ;I
greater amount of pattern matching. ‘This is bccausc.
The response time for a single itcration for thc ACS each message in the message list needs to be matchal
and the non-CAM ACS for various numbcr of against each classifier. Thus, the size of the messagc.
classifiers are shown in Figure 3 on page 6 Rcsponsc list becomes a major factor in the amount of spcedul3
time includes the input message matching and the which will be observed betwecn CAM and non-CAhl
output message generation, as wcll as, the gcnrration classifier system implementations. Additionally.
of one new rule via genetic algorithms At first glance classifier systems which requit,c morc than 32 bits pc1
these times are somewhat disappointing bccausc a]- condition or have more then one condition pct
though the CAM version executes faster, it only does classifier would benefit from the CAM row logic.
so by a factor of about 2 and the icsponsc time in- The row logic can intcgrate the results from multiplr
creases noticeably as the numbcr of classifim in- word matches (CAM items longer than 32 bits) and
creases. There are sevcral factors which account for from multiple condition matches in parallel across all
the ACS response times bcing slowcr thcn antic- classifiers, whelms a non-CA h4 program would haw.
ipated. The first reason is hardware bawd and results to process this data scrialiy.
from running under I’C DOS Due to the design of
the C P interface, the host C codc must switch from ’l’he reason that there is an incrcase in response timr
real to protected mode, requiring approximately 30 as thc numbcr of classifiers was increased is due to thr
microseconds of overhead, for each command scnt to scction of the program which is required to read all
the CP. With 1000 classifiers and 3 window siic of matching cnirics from the CAM. The numbcr ol
.001 on the 6 Multiplexor prohlcm thrrr are an avei- matches which occur in the I?OOI,F, system is :I
52
Authorized licensed use limited to: Chameli Devi Inst of Tech and Management. Downloaded on July 31,2010 at 06:24:18 UTC from IEEE Xplore. Restrictions apply.
I
-
- - Without CAM
a;
- With CAM, Nearest Selection
E
._ --. With CAM, Nearby Selection,,’
+ 0.10 - /
/
a,
v, /
c
0
E 005 ... .- 0
/-.----
,/--
/
E
Figure 3. ACS versus n o n - C A M ACS rcsponse time Figure 4. Execution speed for I I-Mnltiplexer Proh-
for 6-Mnltiplexer Prottlcni lem
function of the problcm size. ‘I’hc avcragc number performance. NS algorithm performance on thc I I
of matches is given by: Multiplexor shows that it can achieve the fastest re-
sponse times. This and the fact that thcrc aren’t any
Matches = P/(2**(2/3 * N)) (1) parameters to tune to make the NS algorithm run
optimally makes it the better of the two sclcction al-
where N is the number of address and data bits, P is gorithms.
number of classifiers. ‘This equation assumes that the
DON’T CARE symbol makes up one third of all Figurc 5 on page 7, shows tile pcrformance of ACS
symbols. This equation indicates that by trying to for the 11 Multiplexor problem when the ovcrheatl
solve a larger multiplexer problem the number of due to 110s mode switching is removcd.
matches per iteration should dccrcase, thus decreasing
rcsponse time as shown in Figurc 4. Once a largc
enough problem is sclectcd, cnsuring that only a few
classifiers are active per cyclc, the rcsponse time will 7.0 Conclusions
increase slowly with respect to thc numbcr of
classifiers. The associativc algorithms of the ACS for thc Co-
hererit I’rocessor have becn presentcd. It has been
In both Figure 3 and Figure 4, the effect of the win- demonstrated that this associative implementation ol
dow size parameter for thc classifier sclcction algo- the DOO1,I~ classifier Tystem Icarns as wcll as result..
rithm is shown. Dccrcasing the size of the window published for icrial implcmcntations. It has bccn
almost always caused the response titnc to increase, shown that thr use of an associative processor as :I
except cases with small numbers of classifiers. When co-processor cxn decrease clarisifier system rcsponsc
the number of classifiers was less than 800 the lowest time, particularly for classificr systems with a largc
value for window size which could be used without numbcr of rules. In fact, when the number of rule4
causing the response time to deteriorate was .001. in the ACS were increased by an order of magnitudr
These figures also show thc pcrformance of the NS thc rcsponse time of the system increased only 25”h
algorithm. On the 6 multiplexor problem the NS al- aftcr DOS ovcrhead was rt:inovcd. Also, if thc
gorithm ran a little slower then the window method amount of data transferred Ixtween the associativc
using a window size of .001 for grcater then 500 processor and the host is minimized thc rcsponsc limc
classifiers. This is a result of clistrihuting reward to a can he almost invariant to t t l u number of classifier\
large number of classifiers per cycle causing classifier which make up the system. This is the case wheri
strengths to remain small. Thus many classifiers will there are only a small pcrccntage of classificrs activr
share the same strength value which results in the NS per cyclc, siich as with thc 1 I-Multiplcxcr problem.
algorithm applying serial selection techniques. This for which a speedup of greater then 14 was ohscrved
coupled with the fact that NS rcquires a grcatcr Ijasecl on thc results from the C A M ACS for thc
number of computations is the cause of NS’s slowcr I I-multiplexer problem a full scale classifier system i\
53
Authorized licensed use limited to: Chameli Devi Inst of Tech and Management. Downloaded on July 31,2010 at 06:24:18 UTC from IEEE Xplore. Restrictions apply.
currcntly bcing devcloped to verify if significant
0.05 speedups can be attained because of larger, sophisti-
catcd matching requirements.
ai
E 0.03 -
._ Acknowledgements
c
9) I would like to thank my peers who have helped with
2 0.02 ~
6. Coherent Research Inc, (loherent I’rocprror 12. Stewart W. \Vilson, “Classifier Systems ancl
Users Guide Ver 2.0 Coherent Research I n r , the Animat Problcm,” Machinp I,rar.ning.
1990. vol. 2, pp. 199-228, 1987.
54
Authorized licensed use limited to: Chameli Devi Inst of Tech and Management. Downloaded on July 31,2010 at 06:24:18 UTC from IEEE Xplore. Restrictions apply.