You are on page 1of 11

Artificial intelligence (AI) is the intelligence of

machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success. John McCarthy, who coined the term in 1956, defines it as "the science and engineering of making intelligent machines. The field was founded on the claim that a central property of humans, intelligence the sapience of Homo sapienscan be so precisely described that it can be simulated by a machine. This raises philosophical issues about the nature of the mind and the ethics of creating artificial beings, issues which have been addressed by myth, fiction and philosophysince antiquity. Artificial intelligence has been the subject of optimism, but has also suffered setbacks and, today, has become an essential part of the technology industry, providing the heavy lifting for many of the most difficult problems in computer science. AI research is highly technical and specialized, deeply divided into subfields that often fail in the task of communicating with each other. Subfields have grown up around particular institutions, the work of individual researchers, the solution of specific problems, longstanding differences of opinion about how AI should be done and the application of widely differing tools. The central problems of AI include such traits as reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects. General intelligence (or "strong AI") is still among the field's long term goals. Currently, no computers exhibit full artificial intelligence (that is, are able to simulate human behavior). The greatest advances have occurred in the field of games playing. The best computer chess programs are now capable of beating humans. In May, 1997, an IBM super-computer called Deep Blue defeated world chess champion Gary Kasparov in a chess match. In the area of robotics, computers are now widely used in assembly plants, but they are capable only of very limited tasks. Robots have great difficulty identifying objects based on appearance or feel, and they still move and handle objects clumsily. Natural-language processing offers the greatest potential rewards because it would allow people to interact with computers without needing any specialized knowledge. You could simply walk up to

a computer and talk to it. Unfortunately, programming computers to understand natural languages has proved to be more difficult than originally thought. Some rudimentary translation systems that translate from one human language to another are in existence, but they are not nearly as good as human translators. There are also voice recognition systems that can convert spoken sounds into written words, but they do not understand what they are writing; they simply take dictation. Even these systems are quite limited -- you must speak slowly and distinctly. In the early 1980s, expert systems were believed to represent the future of artificial intelligence and of computers in general. To date, however, they have not lived up to expectations. Many expert systems help human experts in such fields as medicine and engineering, but they are very expensive to produce and are helpful only in special situations. There are two goals to AI, the biggest one is to produce an artificial system that is about as good as or better than a human being at dealing with the real world. The second goal is more modest: simply produce small programs that are more or less as good as human beings at doing small specialized tasks that require intelligence. To many AI researchers simply doing these tasks that in human beings require intelligence counts as Artificial Intelligence even if the program gets its results by some means that does not show any intelligence, thus much of AI can be regarded as "advanced programming techniques". The characteristics of intelligence given here would be quite reasonable I think to most normal people however AI researchers and AI critics take various unusual positions on things and in the end everything gets quite murky. Some critics believe that intelligence requires thinking while others say it requires consciousness. Some AI researchers take the position that thinking equals computing while others don't. A much more meaningful method of determining whether or not a computer is thinking would be to find out exactly what people are doing and then if the artificial system is doing the same thing or so close to the same thing as a human being then it becomes fair to equate the twoOne of the positions on intelligence that I mention in this section is that it requires consciousness and consciousness is produced by quantum mechanics. For those of you who have been denied a basic education in science by our schools quantum mechanics goes like this. By the beginning of the 20th Century physicists had discovered electrons, protons and various other very small

particles were not obeying the known laws of Physics. After a while it became clear that particles also behave like waves. The most well-known formula they found to describe how the particles move around is the Schrodinger wave equation, a second order partial differential equation that uses complex numbers. Since then quantum formulas have been developed for electricity and magnetism as well. Although as far as anyone can tell the formulas get the right answers the interpretation of what's really going on is in dispute. The formulas and some experiments apparently show that at times information is moved around not just as the very slow speed of light but INSTANTLY. Results like this are very unsettling. One of the developers of QM, Neils Bohr once said: "Anyone who isn't confused by quantum mechanics doesn't really understand it." But, relax, there is virtually no QM in this book, certainly not the formulas and nothing to get confused about. On the other hand research into applying quantum mechanics to human thought may soon make it necessary to include QM in Psychology and AI booksBottom lining it then, there are terrible disagreements between various camps in and out of AI as to what really constitutes intelligence and it will be a long time before it is sorted out.

much more powerful computing device than anyone has suspected. I've seen estimates that the brain processes from 10^23 to 10^28 bits per second with this architecture, well beyond the 10^16 bits per second estimate with the neuron as a simple switch architecture. Another suggestion is that the microtubules could be set up to allow optical computing. For more on the microtubules see The Quantum Basis of Natural Intelligence? page and/or the article by Dimitri Nanopoulos available from Los Alamos National Laboratory For the sake of trying to produce intelligent behaviour however really all that's being done is work with artificial neural networks where each cell is a very simple processor and the goal is to try and make them work together to solve some problem. That's all that gets covered in this book. Many people are skeptical that artificial neural networks can produce human levels of performance because they are so much simpler than the biological neural networks.

Symbol Processing
Symbol Processing has been the dominant theory of how intelligence is produced and its principles can be used to do certain useful tasks. Using this method you write programs that work with symbols rather than numbers. Symbols can be equal or not equal and that is the only relation defined between symbols, so you can't even ask if one is less than another much less do arithmetic with them. Of course in symbol processing programs the symbols do get represented by integers. These researchers have never been interested in how to implement symbol processing using neural networks. Besides the use of symbols the other key principle involved is that knowledge in the mind consists of a large number of rules. Symbol processing adherents make the bold claim that symbol processing has the necessary and sufficient properties to display intelligence and to account for human thought processes. Even though they say that symbols can only be equal or not equal and there are no other relations defined for them quite often "symbolic" programs end up using integers or reals as part of the program and its called symbolic anyway even though by the strictest standard doing so no longer makes the program wholly symbolic, only mostly symbolic. Relatively little has been released on the CYC project. One FAQ on CYChas been produced by

Association
Perhaps the most important principle in all of AI is found here, when one idea has been associated with another then when one comes up again the other will too. This was put very nicely by William James in his works on Psychology first published in 1890 (Yes, eighteen ninety!). There are several examples given. James' books are well worth reading if you ever have the time to spare. His other ideas come up in cognitive science fairly often.

Neural Networking
When people first discovered that nerve cells pass around pulses of electricity many people ASSUMED that this activity was used to produce thought. Heck, what else was there, if you were a materialist? (The only alternative was the idea that there is a human soul and the materialists didn't want anything to do with that!) Now there is a new theory floating about because of recent discoveries about cells. Almost every cell has a cytoskeleton built out of hollow tubes of protein. The tubes are called microtubules and there is now a proposal that these structures are involved in quantum mechanical computing. If thinking is done this way then the human mind is a

David Whitten. The official CYC WWW site is Cycorp, Inc. My opinion is that CYC may be quite useful for certain applications but it still won't be operating like a human being or anywhere near as well as a human being so in those most important senses it will be a failure.

Heuristic Search
Another of the principles used in AI is heuristic search. In heuristic search (as opposed to blind search) you use some information about the problem to try and solve it quickly.

The idea in this chapter is to do a little pattern recognition with concrete down to Earth problems likes recognizing letters, digits and words. The more abstract version of pattern recognition comes along in chapter 3. You could make a point that chapter 3 should come before chapter 2 then, except I think abstraction should come second not first. Also in this chapter it is shown that "simple" pattern recognition like recognizing letters is not so simple after all: it is tied up with everything we know, all the way up to the highest levels.

A Simple Pattern Recognition Algorithm


The idea here is to show a simple hand-crafted neural network inspired by James' principles of association that can recognize a few letters of the alphabet. The weights are chosen by hand however the algorithms coming up in the next chapter can be applied to setting the weights as well. They probably do a better job than anything you can hand-craft. In this section I assume the matrix containing the letters is simply passed to a conventional program which finds the short line segments that are used. This portion isn't really neural however in the next section I show how this too can be done with a neural network. Its quite easy to program the model in this section. From time to time people ask in comp.ai.neural-nets how to do character recognition so here is a description of it based on what I have in the book.

The Problems with AI


Many people have grave doubts about symbol processing AI and are looking for alternatives. Others are still confident that symbol processing will work.

The New Proposals


If you ask me and many others using just symbol processing and heuristic search has not been working out very well and new ideas have to be investigated. Its pretty straightforward to list most of them, you only have to take the assumptions of classical symbol processing AI and list the alternatives. First, symbols are not enough, real numbers are necessary. Second, symbols and structures of symbols are not enough, pictures are necessary (and of course representing them requires using numbers). Third, rules are not enough, actual cases are necessary. Another alternative is that quantum mechanics will be necessary. MacLennan sees the use of real numbers and images (pictures) as the main characteristics of connectionist AI. Of course the symbol processing AI camp doesn't like connectionist AI and one of their big criticisms is that in connectionist AI there is no way to structure data with vectors of real numbers and so it can't possibly work. This is a good point, there is no ESTABLISHED GOOD WAY of structuring data in vectors but the research is moving along and some of this will come up in Chapter 6. MacLennan deals with this criticism from the symbol processing camp in a couple of very worthwhile papers that are fairly easy to read as such things go and I would love to see some commentary on them from people in the symbol processing camp.

A Short Description of the Neocognitron


This section shows how even finding the short line segments in a pattern can be done neurally. The weight adjustment formulas and activation functions of the neocognitron are not discussed. The neocognitron is capable of two types of learning, the supervised mode where you give the network the answer and an unsupervised mode (much slower) where you simply present it with pattern after pattern.

Recognizing Words
Recognizing letters and symbols is not done in a vacuum, it is normally done as part of some task where your thinking biases you to see certain patterns. In this section I show how a more sophisticated type of network, an interactive activation network, can be used to capture this bias in

Pattern Recognition

a realistic way. The example comes from a famous work by Rumelhart and McClelland. I did a not at all fancy version of the interactive activation network and the C source, DOS binaries and elementary instructions. It can be used for some of the exercises.

methods must be used. The simplest such algorithm is the nearest neighbor classifier. You simply take your unknown (as a vector) and compute the distance from it to all the other patterns you have the answer for. The answer you guess for the unknown pattern is the class of the closest pattern. In the nearest neighbor algorithm you have to keep a large inventory of patterns and their classifications so searching through this inventory for the closest match may take quite a long time. Lately some interesting variations on the nearest neighbor method have been developed that are much faster because you store fewer known patterns. The idea is to scatter a few prototype points, that is representative patterns, around the space for each class. There is a whole series of algorithms to do this that are called learning vector quantization (LVQ) algorithms. Perhaps the simplest of these is one called decision surface mapping (DSM) and this one is covered in the text. At one time I had the LVQ1 algorithm in the text as well in another chapter called Other Neural Networking Methods but in the end I thought it was best to cut this chapter from the book and make it available on the net. I've done a nearest neighbor program in C with DOS binaries that implements the k-nearest neighbor algorithm, DSM and LVQ1. More LVQ software is available from the group that started it all, the LVQ/SOM Programming Team of the Helsinki University of Technology, Laboratory of Computer and Information Science, Rakentajanaukio 2 C, SF-02150 Espoo, Finland. It comes as UNIX source and as a DOS self-extracting file. There are also other programs that can be used with this LVQ software.

Expanding the Pattern Recognition Hierarchy


If you think recognizing letters of the alphabet is a problem then it gets still worse because interpreting the meaning of a sentence depends on knowing a lot about language and a lot about the world and this section shows some examples.

Additional Perspective
I thought it best to do pattern recognition from the standpoint of identifying letters of the alphabet because it is easy to relate to this and it shows how many levels of knowledge are involved in the process but there is a lot more to the subject. First, the methods given here are not completely realistic in that human pattern recognition is much more complex than these algorithms. My guess is that the human algorithms will prove to be better than any of the simple man-made ones. Second, its also important to be able to interpret pictures of arbitrary scenes, not just letters and algorithms however this important topic was not covered here because you just can't do everything and because the principles involved are quite similar.

The Linear Pattern Classifier


This is one of the simplest of all pattern classification algorithms. There is a simple algorithm to find a line, plane or hyperplane that will separate two patterns that you are trying to learn. In effect the line, plane or hyperplane defines the weights of a neural network. One of the beauties of this scheme is that once you find the weights for the network you can throw away all the cases you used to develop the weights. A simple version can be programmed in a hundred lines or so and my version is also available.

Hopfield Networks
Besides the Hopfield network this also contains the Boltzman machine relaxation algorithm. This Boltzman machine idea is really very intriguing because of the way it looks up memories. Its radically different from conventional computer architectures. While its very interesting theoretically there have been very few applications of this method. In the highly recommended Nanopoulos article he says (in effect) that the microtubules can form a Hopfield network. (I checked with physicist Jack Sarfatti on this to make sure I was interpreting Nanopoulos correctly.) Each tubulin molecule would represent a 1 or 0 depending on which state its in. I can't imagine how the weights get represented in this system and especially how a molecule at one end of the MT fiber can influcence a molecule at the other end so I asked Jack the following:

Separating Non-Linearly Separable Classes


In many problems data cannot be separated by lines, planes or hyperplanes and then some non-linear

if the MTs work this way how does one unit on the far left manage to influence a unit on the far right? (I can imagine one unit affecting a neighboring unit but that is all.) What are the weights between the MT units and how are they created? Is there quantum "magic" at work here? His reply was: Yes. We have to look at the quantum version in which the quantum pilot waves provide long-range nonlocal links between the classical switches. Exactly how to do this mathematically I don't know right this minute. So APPARENTLY some quantum "magic" allows a tubulin molecule to connect to every other tubulin molecule giving a Hopfield type network, a pretty neat way to implement the algorithm. Nanopoulos also notes that the physical temperature of the tubulin fibers and the strength of the electric fields surrounding them change the characteristics of the fiber making possible (*I* think) the Boltzman machine relaxation algorithm. I'd like to see a supercomputer simulate this numerically, its a good project for someone I think. The Hopfield and Boltzman programs are available as C source and DOS binaries

Since backprop is so useful I've spent a lot of time working on a good version of backprop and it is available online.

Pattern Recognition and Curve Fitting


This section shows what backprop really ends up doing is fitting curves and surfaces to whatever data you give it. It is a form of non-linear regression. It also shows an example of overfitting, if you're doing anything important with backprop you MUST know about overfitting.

Associative Memory and Generalization


This section does several things. First, it demonstrates the use of backprop networks to produce an associative memory and how such a network can respond in a reasonable way to patterns it has never seen before. It can give rule-like behavior without actually having any rules. The data for the sports example is in my backprop package. Second, there is the issue of local and distributed representations in networks. Third, it shows how a network with a distributed representation can do some elementary reasoning. The Perrone and Cooper article on the virtues of averaging the results of several networks, "When Networks Disagree: Ensemble Methods for Hybrid Neural Networks" is available by FTP from Neuroprose archive at Ohio State. This paper proves that under certain conditions that averaging the output units of a number of networks will give you a better estimate than the best of these networks by itself.

Back-Propagation
Again, backprop is really important because of the large number of problems it can be applied to. Notice how many times it was discovered before it finally caught on. If you're familiar with regression notice how backprop is really just a version of non-linear regression. There are loads of ways to speed up the plain algorithm and generally speaking you should use them not the plain algorithm however sometimes the plain version will give the best results. At this point in time there is no way to know ahead of time which improved version or the plain version will be best. If you want a tutorial on backprop that is much the same as the one in the text get my postscript file or see the newer HTML file. The Rprop and Quickprop papers are available online, for these and more material on backprop see my Backpropagator's Review.

Rules
This chapter is an abrupt shift to symbol processing techniques however note that to a considerable extent it is a symbol processing analog to the material in the two pattern recognition chapters. Rules are simply another way to do pattern recognition and they have advantages over networks in many types of problems. They look a lot like small neural networks. If you want to put more emphasis on symbol processing in your course then you could actually start with section 4.2 go on to the online PROLOG chapter or the online LISP chapter.

Introduction

Even though I think its quite reasonable to say that expert systems do the job of a human expert and can be made using either connectionist or symbolic techniques the term originated with the symbolic systems and therefore some people tend to think of expert systems as symbolic and not connectionist. I needed a notation to describe symbol processing algorithms and rather than make up my own I thought it was better to use an established symbolic language, PROLOG, to do this. As I've already said PROLOG comes with some pattern matching capabilities builtin and this makes it more convenient to use for this purpose than LISP. Sometimes people take to one of these two languages better than the other. Personally if I had to write large programs I'd rather use LISP because its easier to trace what its doing when a bug comes up. I've had a lot of trouble trying to follow debugging output from the PROLOG interpreters I've used.

down to a few rules. Unfortunately the compression of facts about the world down to rules leaves you with problems like conflict resolution.

More Sophisticated Rule Interpretation


Note here how PURE symbol processing does not work very well in reality for many (most?) problems, it has to be extended by the use of real numbers. The symbol processing camp has called such systems symbolic anyway.

The Famous Expert Systems


This section contains descriptions of famous expert systems. Many expert systems are in use today but little is published about them because companies do now want their competition to know how they work. Note also that real expert systems need a decent human interface and that's why some sample output is provided in this section.

Some Elementary PROLOG


PROLOG relies on facts and rules, simple rules of the form "if A then B" or "if A and B then C". This section covers simple PROLOG including recursion and the most elementary list processing.

Learning Rules in Soar


SOAR is a nice example of how a program can learn "rules" although to me these "rules" look more like memories. The program keeps these rules/memories of key situations so it does not have to search all over again the next time the same situation comes up. The fact that for some problems it gives the power law of practice results is intriguing. I've never puttered around with SOAR but I'd like to if I can ever find the time. If you want to putter around with it, it is available as source code, a binary for Intel machines and a binary for the Mac by FTP from Carnegie-Mellon or consult the SOAR home page at Carnegie-Mellon. There is a SOAR FAQ at: http://acs.ist.psu.edu/soar-faq/soar-faq.html. SOAR researchers have a commercial site,

Rules and Basic Rule Interpretation Methods


First let me note that there is this idea of "knowledge engineering" and "knowledge based systems" and it may not be a bad idea to use such terms instead of "artificial intelligence" since these systems seldom display what I require for even a minimal amount of intelligence. This section does the forward and backward chaining versions of Winston's well-known animal identification problem. The forward and backward chaining programs are in the files forward and backward in the PROLOG Programs Package.

Rules vs. Networks


There is a little bit of a comparison here between rules and networks used in a couple of expert systems. There is also another interesting development going on, extracting rules from backprop networks and even using rules to make networks.

Conflict Resolution
The only comment I can make here is something that I will really make a point of in Chapter 7. There are many ways to do pattern recognition with cases, rules and networks. Rules are in effect a compressed form of knowledge, that is, you can take many cases and compress (think of the UNIX compress program or the JPEG compression algorithm for pictures) them

Logic

This chapter basically deals with Resolution based theorem proving, a more general version of reasoning than the "if A then B" type rules you find in PROLOG. The chapter uses a notation from a public domain Resolution based theorem proving program called Otter from Argonne National Laboratories. I have tried it a little and it is fairly nice. They have C code, a 32-bit DOS binary, a Macintosh version, user manuals in various formats and some sample problems. This is available by http and by ftp. A better source of sample problems is the Thousands of Problems for Theorem Provers collection by Geoff Sutcliffe and Christian Suttner. This collection is rather large, a gzipped file of 1M which, when gunzipped comes to 18M. In all likelihood the only problems you will want are the ones in the puzzles directory. This package is available from the University of Munich by ftp and by http or from James Cook University, Australia, by ftp and by http.

Otter for solution. This Otter program is in a PROLOG Programs Package.

The Usefulness of Predicate Calculus


PC theorem proving is useful for some special applications however some researchers think it is not particularly realistic in that it does not seem to be the way people operate and there are other theories that have been proposed to explain human reasoning. Of course believers in this formal logic approach like to think that the methods in this chapter are realistic or if they believe they are different from the way people work these researchers may say that formal logic methods may ultimately turn out to be be as good as or better than the human method. I think what with the great amount of faulty reasoning going on today this topic ought to be required in every high school. On the other hand it is also an absolutely wonderful example of what is not working and will never work with respect to getting programs to duplicate human level capabilities. The statements you cook up to solve problems are designed to work in a very specific, very limited, idealized situation. It takes quite a lot intelligence on the part of the person writing the rules to do this. But after that, grinding away (as a program does) does not constitute intelligence. The program has no idea what it is dealing with, (people, cats, kangaroos, etc.) and it cannot fill in gaps or correct mistakes when the human programmer makes a mistake.

Standard Form and Clausal Form


This section notes that there is a notation for predicate calculus that involves using the symbols, `implies', `is equivalent to', `for all' and `there exists' yet given statements in this notation these relations can all be eliminated to give a form called clausal form. Of course its quite easy to just write statements in clausal form in the first place.

Basic Inference Rules


This section gives a number of rules (binary resolution, UR-resolution, and hyperresolution) for deriving new, correct statements from existing statements.

Other Logics
Real human logic is not discrete, it is analog and conclusions come with a confidence factor. For an example of how continuous valued logic can be formalized see "Beyond Associative Memories: Logics and Variables in Connectionist Models", by Ron Sun, from the Ohio State Neuroprose Archive. (The book only mentions this in one VERY short paragraph.) There is just the merest mention of the work of Johnson-Laird and Byrne whose research on people indicate that people reason by constructing models of the situation and looking for counter-examples. I've never had time to read more than a little of their book, Deduction yet it seems to fit in rather well with my bias toward image processing. In a previous work called Mental Models Philip Johnson-Laird proposed the idea of mental models based on his research. In

Controlling Search
Just blindly using the possible rules to solve a problem is not a good way to quickly find an answer. This section gives several heuristics that can speed the search (proof by contradiction, set of support, weighting and PROLOG's strategy). These help, but there is a lot left to be desired.

An Example Using Otter


This contains a sample problem and its translation from English into clauses, then it is submitted to

the introduction to Deduction they write that "Mental Models was well-received by cognitive scientists

Complex Architectures
The book starts with some of the simplest pattern recognition algorithms and moves on to higher levels of thought and reasoning. For the most part (not completely!) chapters 2 through 5 deal with one step pattern recognition type problems. But that is not good enough. Most problems involve doing many steps of pattern recognition and we really need to find an architecture that will do many steps of pattern recognition to solve one problem or to cope with the real world. Unfortunately not a lot can be done with this subject. What is done is to introduce the idea of a short term memory that works in conjunction with a long term memory. Besides the architecture problem there is also the problem of how to represent thoughts within such a system. Symbol processing advocates simply propose structures of symbols. Neural networking advocates have yet to establish good methods for storing structures of thoughts (unless you consider pictures as structured objects but hardly anyone has worked on this).

the use of symbols. I believe that Harnad is right in saying that symbols must be defined in terms of pictures, the idea of symbol grounding. The paper by Steven Harnad called, "The Symbol Grounding Problem" is a compressed text file and is available from the Ohio State neuroprose archive.

Storing Sequential Events


Its rather easy to store a sequence of symbols, like the words in a line of poetry, in a symbol processing format, you simply use a linked list. To do the same thing in a neural networking format you can use a recurrent network with a short-term memory and get it to "memorize" a list of words. The data for the example network that memorizes the two lines of poetry is included my backprop software.

Structuring Individual Thoughts


This section considers how to store the ideas in simple sentences like "John and Mary ate cookies". This is easily done in a symbol processing framework but its not so easily done (as yet) in a neural networking format. This section contains a few of the many ways of storing structured thoughts that have been proposed. I doubt that any of them, not the neural and not the symbolic are the "right" one, the one that people use. If you wonder why I'm obsessed with finding out what people do its because I suspect whatever people are doing in terms of storing and using information is rather clever. After we find out what is going on we may be able to apply the principles somewhat differently in an artificial system. MacLennan commented on some neural methods that are essentially neural implementations of symbolic methods by saying they are "just putting a fresh coat of paint on old, rotting theories". The RAAM articles by Jordan Pollack are online, first "Implications of Recursive Distributed Representations" from the Ohio State neuroprose archive and second "Recursive Distributed Representations" from the Ohio State neuroprose archive. RAAM is a cute idea but I have to wonder whether its realistic or not.

The Basic Human Architecture


This section lists some requirements for a human-like architecture including short-term and long-term memory.

Flow of Control
The point here is that the human architecture is interrupt-driven, analog and fallible.

The Virtual Symbol Processing Machine Proposal


A short section that describes roughly how a PDP architecture can be used to simulate a symbol processing architecture.

Mental Representation and Computer Representation


This section is largely concerned with the difficulties involved in trying to describe the world entirely with

A variation on RAAM called SRAAM for Sequential RAAM turns trees into a linear list and then stores the list in a RAAM-like procedure. See the article "Tail-Recursive Distributed Representations and Simple Recurrent Networks" by Stan C. Kwasny and Barry L. Kalman available by http from: Washington University in St. Louis. This is not covered in the book. A newer scheme that may be more realistic and which I don't cover in this section is in the article "Holographic Reduced Representations", by Tony Plate, published in the May 1995 IEEE Transactions on Neural Networks and also online from the University of Toronto. Actually if you FTP to Tony Plate's directory at the University of Toronto you'll find this topic comes in documents of various sizes including a whole thesis.

Of course the psychology of using rules and networks to compress data is interesting in itself. First there is the traditional scientific goal of trying to describe how the world operates using as few equations (their version of rules) as necessary. I'm sure this is a major motivation for the symbolic AI community to find rules to express all that is known about the world. The other bit of psychology involved in compressing knowledge is that its really convenient for von Neumann style computers to work that way. All the computation has to go through a single CPU and to scan a large database as you do in a nearest neighbor method takes a lot of time. Thus a few rules can deal with a problem much faster than an exhaustive search of memory. But what if you have a machine architecture that can search for the nearest match easier than it can deal with rules? For now there are systems like the Connection Machine that can search in parallel. Then there is the scheme used in the Boltzman machine where given part of a pattern you can come up with the rest of the pattern using one cooling session. So it strikes me that between the scientific tradition of forming rules and the constraints of the von Neumann architecture AI researchers have been mentally trapped into the condensed/compressed knowledge scenario. But in this chapter try to free yourself from such thinking. Consider the virtues of a different architecture, one where you don't even have to bother to come up with rules! Quite a liberating concept if you ask me! But if people operate mostly with cases it poses quite a problem for AI researchers because to come close to duplicating human capabilities it will require vast amounts of storage space as well as an appropriate parallel processing architecture. I just ran into a WWW site that promotes Case Based Reasoning: Welcome to AI-CBR

Cased Based and Memory Based Reasoning


This is one of my favorite chapters because it includes very revolutionary and very insightful ideas. As I've mentioned above there are many ways to do pattern recognition: keep cases, make rules or train networks. Making rules and training networks are examples of ways to compress lots of cases down to a small number of rules or a network. Thus these methods are much like the UNIX compress function or the JPEG compression algorithm for compressing pictures. In fact in practice they are more like the JPEG compression algorithm in that they are also lossy like the JPEG format. In that JPEG format the compression scheme will degrade the quality of the picture just a little but generally not enough that a person can notice any loss of picture quality. Likewise with rules and networks its unlikely you can get them to fit your data perfectly so you experience loss with these methods as well. Of course there is a difference between what rules do and what compress/JPEG does. Compress and JPEG can give you back what you entered. With rules and networks there is no way to get back the original data, although if you started generating inputs for the rules or the network then every time you get an answer you generate a new case.

Condensed vs. Uncondensed Knowledge


I've originated the term "condensed knowledge" here not just to capture the "compression" type of thinking that I've described above but also to make it clear that traditional symbolic systems try to capture knowledge about the world without the system ever having seen complete examples of what it is supposed to know about. So these systems know facts about, say a rose, without ever having seen a complete rose, or smelled one or been pricked by a thorn. Its a philosophy somewhat like taking a human

being and describing one by saying that a person is 95% (or whatever percent) water, x% nitrogen, y% calcium, z% carbon, etc. I chose the word condensed because of its use for cans of condensed soup that you buy at the store. In brief this section gives arguments for and against the use of condensed knowledge.

case out there in the future he had the solution in his mind. Now in the present all he has to do is "remember" it. If information can be passed along at faster than the speed of light this is quite possible. If that is what is going on then it gives people a heuristic that no ordinary computer can duplicate. If QM allows you to remember the future you have to wonder why people don't remember the future all the time, especially useful things like the winning lottery numbers? My answer to this is that if anyone could remember the future with any degree of accuracy we would not have lotteries because everyone would win and having a lottery would be pointless. If you're going to have a lottery the "universe" has to find a compromise between everyone remembering the winning numbers and no one winning the lottery. It has to be rare for memories from the future to appear in the present. Mathematicians and other scientists are lucky in that when they have a flash of insight they have the means to prove it correct here in the present.

Problem Solving and Heuristic Search


Whereas it is easy to observe heuristic search in people solving problems notice that people have some kind of general architecture that lets them do all sorts of problems whereas AI researchers have not yet found such a general approach, every new problem must be programmed. One tool called SOAR makes this easy yet every new problem requires a special effort. Notice too that in these heuristic programs the approach is normally to list all the possible moves you can make, rate each one and then take the best of these moves. I've never studied how people do such things but it seems to me that the way a person works is to use a Boltzman machine like strategy to come up with a solution and try it out immediately without trying to come up with more solutions. If this good solution does not work then the person tries to come up with another one. The other thing I want to mention is this "flash of insight" behavior since it may be that quantum mechanics can explain it. In Penrose's book, The Emperor's New Mind, he talks about one case where he was busy trying to solve a certain problem in mathematical physics. Then one day while he was doing something unrelated, walking down a road discussing a different subject with a colleague he had a flash of insight into how to solve his mathematical physics problem. Later on he used this insight to solve his problem. There are other similar examples of this that other famous scientists have reported. I'd like you to notice that this flash of insight works a lot like something that happens to ordinary people. Suppose you're simply trying to remember something. For instance, who played that part in a certain movie or TV show? Its one of those things that you know you know but you can't get to the answer, only a faint impression of it. Then perhaps hours later when you're not even trying, the answer pops into your mind. Some quantum mechanics enthusiasts suggest that the flash of insight behavior is really the same thing except in this case you're remembering the future. For instance in Penrose's

REFERENCES
www.mtsu.edu/~sschmidt/Cognitive/ Pattern%20Recognition.pdf www.stats.ox.ac.uk/~ripley/PRNN/C ompl.pdf www.filestube.com/p/pattern+recog nition+pdf www.cse.unr.edu/~bebis/.../Pattern RecognitionReview/lecture.pdf

You might also like