You are on page 1of 8

Knowledge engineering and data mining

In which we discuss how to pick the right tool for the job, build an intelligent system and turn data into knowledge.

9.1 Introduction, or what is knowledge engineering?


Choosing the right tool for the job is undoubtedly the most critical part of building an intelligent system. Having read this far, you are now familiar with rule- and frame-based expert systems, fuzzy systems, articial neural networks, genetic algorithms, and hybrid neuro-fuzzy and fuzzy evolutionary systems. Although several of these tools handle many problems well, selecting the one best suited to a particular problem can be difcult. Daviss law states: For every tool there is a task perfectly suited to it (Davis and King, 1977). However, it would be too optimistic to assume that for every task there is a tool perfectly suited to it. In this chapter, we suggest basic guidelines for selecting an appropriate tool for a given task, consider the main steps in building an intelligent system and discuss how to turn data into knowledge. The process of building an intelligent system begins with gaining an understanding of the problem domain. We rst must assess the problem and determine what data are available and what is needed to solve the problem. Once the problem is understood, we can choose an appropriate tool and develop the system with this tool. The process of building intelligent knowledge-based systems is called knowledge engineering. It has six basic phases (Waterman, 1986; Durkin, 1994): 1 2 3 4 5 6 Problem assessment Data and knowledge acquisition Development of a prototype system Development of a complete system Evaluation and revision of the system Integration and maintenance of the system

302

KNOWLEDGE ENGINEERING AND DATA MINING

Figure 9.1

The process of knowledge engineering

The process of knowledge engineering is illustrated in Figure 9.1. Knowledge engineering, despite its name, is still more art than engineering, and a real process of developing an intelligent system is not as neat and clean as Figure 9.1 might suggest. Although the phases are shown in sequence, they usually overlap considerably. The process itself is highly iterative, and at any time we may engage in any development activities. Let us now examine each phase in more detail.

INTRODUCTION, OR WHAT IS KNOWLEDGE ENGINEERING?

303

9.1.1

Problem assessment

During this phase we determine the problems characteristics, identify the projects participants, specify the projects objectives and determine what resources are needed for building the system. To characterise the problem, we need to determine the problem type, input and output variables and their interactions, and the form and content of the solution. The rst step is to determine the problem type. Typical problems often addressed by intelligent systems are illustrated in Table 9.1. They include diagnosis, selection, prediction, classication, clustering, optimisation and control. The problem type inuences our choice of the tool for building an intelligent system. Suppose, for example, we develop a system to detect faults in an electric circuit and guide the user through the diagnostic process. This problem clearly belongs to diagnosis. Domain knowledge in such problems can often be represented by production rules, and thus a rule-based expert system might be the right candidate for the job. Of course, the choice of a building tool also depends on the form and content of the solution. For example, systems that are built for diagnostic tasks usually need explanation facilities the means that enable them to justify their solutions. Such facilities are an essential component of any expert system, but are not available in neural networks. On the other hand, a neural network might be a good choice for classication and clustering problems where the results are often more important than understanding the systems reasoning process. The next step in the problem assessment is to identify the participants in the project. Two critical participants in any knowledge engineering project are

Table 9.1

Typical problems addressed by intelligent systems Description Inferring malfunctions of an object from its behaviour and recommending solutions. Recommending the best option from a list of possible alternatives. Predicting the future behaviour of an object from its behaviour in the past. Assigning an object to one of the dened classes. Dividing a heterogeneous group of objects into homogeneous subgroups. Improving the quality of solutions until an optimal one is found. Governing the behaviour of an object to meet specied requirements in real-time.

Problem type Diagnosis Selection Prediction Classication Clustering Optimisation Control

304

KNOWLEDGE ENGINEERING AND DATA MINING the knowledge engineer (a person capable of designing, building and testing an intelligent system) and the domain expert (a knowledgeable person capable of solving problems in a specic area or domain). Then we specify the projects objectives, such as gaining a competitive edge, improving the quality of decisions, reducing labour costs, and improving the quality of products and services. Finally, we determine what resources are needed for building the system. They normally include computer facilities, development software, knowledge and data sources (human experts, textbooks, manuals, web sites, databases and examples) and, of course, money.

9.1.2

Data and knowledge acquisition

During this phase we obtain further understanding of the problem domain by collecting and analysing both data and knowledge, and making key concepts of the systems design more explicit. Data for intelligent systems are often collected from different sources, and thus can be of different types. However, a particular tool for building an intelligent system requires a particular type of data. Some tools deal with continuous variables, while others need to have all variables divided into several ranges, or to be normalised to a single range, say from 0 to 1. Some handle symbolic (textual) data, while others use only numerical data. Some tolerate imprecise and noisy data, while others require only well-dened, clean data. As a result, the data must be transformed, or massaged, into the form useful for a particular tool. However, no matter which tool we choose, there are three important issues that must be resolved before massaging the data (Berry and Linoff, 1997). The rst issue is incompatible data. Often the data we want to analyse store text in EBCDIC coding and numbers in packed decimal format, while the tools we want to use for building intelligent systems store text in the ASCII code and numbers as integers with a single- or double-precision oating point. This issue is normally resolved with data transport tools that automatically produce the code for the required data transformation. The second issue is inconsistent data. Often the same facts are represented differently in different databases. If these differences are not spotted and resolved in time, we might nd ourselves, for example, analysing consumption patterns of carbonated drinks using data that do not include Coca-Cola just because they were stored in a separate database. The third issue is missing data. Actual data records often contain blank elds. Sometimes we might throw such incomplete records away, but normally we would attempt to infer some useful information from them. In many cases, we can simply ll the blank elds in with the most common or average values. In other cases, the fact that a particular eld has not been lled in might itself provide us with very useful information. For example, in a job application form, a blank eld for a business phone number might suggest that an applicant is currently unemployed.

INTRODUCTION, OR WHAT IS KNOWLEDGE ENGINEERING? Our choice of the system building tool depends on the acquired data. As an example, we can consider a problem of estimating the market value of a property based on its features. This problem can be handled by both expert system and neural network technologies. Therefore, before deciding which tool to apply, we should investigate the available data. If, for instance, we can obtain recent sale prices for houses throughout the region, we might train a neural network by using examples of previous sales rather than develop an expert system using knowledge of an experienced appraiser. The task of data acquisition is closely related to the task of knowledge acquisition. In fact, we acquire some knowledge about the problem domain while collecting the data.

305

What are the stages in the knowledge acquisition process?


Usually we start with reviewing documents and reading books, papers and manuals related to the problem domain. Once we become familiar with the problem, we can collect further knowledge through interviewing the domain expert. Then we study and analyse the acquired knowledge, and repeat the entire process again. Knowledge acquisition is an inherently iterative process. During a number of interviews, the expert is asked to identify four or ve typical cases, describe how he or she solves each case and explain, or think out loud, the reasoning behind each solution (Russell and Norvig, 2002). However, extracting knowledge from a human expert is a difcult process it is often called the knowledge acquisition bottleneck. Quite often experts are unaware of what knowledge they have and the problem-solving strategy they use, or are unable to verbalise it. Experts may also provide us with irrelevant, incomplete or inconsistent information. Understanding the problem domain is critical for building intelligent systems. A classical example is given by Donald Michie (1982). A cheese factory had a very experienced cheese-tester who was approaching retirement age. The factory manager decided to replace him with an intelligent machine. The human tester tested the cheese by sticking his nger into a sample and deciding if it felt right. So it was assumed the machine had to do the same test for the right surface tension. But the machine was useless. Eventually, it turned out that the human tester subconsciously relied on the cheeses smell rather than on its surface tension and used his nger just to break the crust and let the aroma out. The data and knowledge acquired during the second phase of knowledge engineering should enable us to describe the problem-solving strategy at the most abstract, conceptual, level and choose a tool for building a prototype. However, we must not make a detailed analysis of the problem before evaluating the prototype.

9.1.3

Development of a prototype system

This actually involves creating an intelligent system or, rather, a small version of it and testing it with a number of test cases.

306

KNOWLEDGE ENGINEERING AND DATA MINING

What is a prototype?
A prototype system can be dened as a small version of the nal system. It is designed to test how well we understand the problem, or in other words to make sure that the problem-solving strategy, the tool selected for building a system, and techniques for representing acquired data and knowledge are adequate to the task. It also provides us with an opportunity to persuade the sceptics and, in many cases, to actively engage the domain expert in the systems development. After choosing a tool, massaging the data and representing the acquired knowledge in the form suitable for that tool, we design and then implement a prototype version of the system. Once it is built, we examine (usually together with the domain expert) the prototypes performance by testing it with a variety of test cases. The domain expert takes an active part in testing the system, and as a result becomes more involved in the systems development.

What is a test case?


A test case is a problem successfully solved in the past for which input data and an output solution are known. During testing, the system is presented with the same input data and its solution is compared with the original solution.

What should we do if we have made a bad choice of the system-building tool?


We should throw the prototype away and start the prototyping phase over again any attempt to force an ill-chosen tool to suit a problem it wasnt designed for would only lead to further delays in the systems development. The main goal of the prototyping phase is to obtain a better understanding of the problem, and thus by starting this phase with a new tool, we waste neither time nor money.

9.1.4

Development of a complete system

As soon as the prototype begins functioning satisfactorily, we can assess what is actually involved in developing a full-scale system. We develop a plan, schedule and budget for the complete system, and also clearly dene the systems performance criteria. The main work at this phase is often associated with adding data and knowledge to the system. If, for example, we develop a diagnostic system, we might need to provide it with more rules for handling specic cases. If we develop a prediction system, we might need to collect additional historical examples to make predictions more accurate. The next task is to develop the user interface the means of delivering information to a user. The user interface should make it easy for users to obtain any details they need. Some systems may be required to explain its reasoning process and justify its advice, analysis or conclusion, while others need to represent their results in a graphical form. The development of an intelligent system is, in fact, an evolutionary process. As the project proceeds and new data and knowledge are collected and added to

INTRODUCTION, OR WHAT IS KNOWLEDGE ENGINEERING? the system, its capability improves and the prototype gradually evolves into a nal system.

307

9.1.5

Evaluation and revision of the system

Intelligent systems, unlike conventional computer programs, are designed to solve problems that quite often do not have clearly dened right and wrong solutions. To evaluate an intelligent system is, in fact, to assure that the system performs the intended task to the users satisfaction. A formal evaluation of the system is normally accomplished with the test cases selected by the user. The systems performance is compared against the performance criteria that were agreed upon at the end of the prototyping phase. The evaluation often reveals the systems limitations and weaknesses, so it is revised and relevant development phases are repeated.

9.1.6

Integration and maintenance of the system

This is the nal phase in developing the system. It involves integrating the system into the environment where it will operate and establishing an effective maintenance program. By integrating we mean interfacing a new intelligent system with existing systems within an organisation and arranging for technology transfer. We must make sure that the user knows how to use and maintain the system. Intelligent systems are knowledge-based systems, and because knowledge evolves over time, we need to be able to modify the system.

But who maintains the system?


Once the system is integrated in the working environment, the knowledge engineer withdraws from the project. This leaves the system in the hands of its users. Thus, the organisation that uses the system should have in-house expertise to maintain and modify the system.

Which tool should we use?


As must be clear by now, there is no single tool that is applicable to all tasks. Expert systems, neural networks, fuzzy systems and genetic algorithms all have a place and all nd numerous applications. Only two decades ago, in order to apply an intelligent system (or, rather, an expert system), one had rst to nd a good problem, a problem that had some chance for success. Knowledge engineering projects were expensive, laborious and had high investment risks. The cost of developing a moderate-sized expert system was typically between $250,000 and $500,000 (Simon, 1987). Such classic expert systems as DENDRAL and MYCIN took 20 to 40 person-years to complete. Fortunately, the last few years have seen a dramatic change in the situation. Today, most intelligent systems are built within months rather than years. We use commercially available expert system shells, fuzzy, neural network and evolutionary computation toolboxes, and run our applications on standard PCs. And most

308

KNOWLEDGE ENGINEERING AND DATA MINING importantly, adopting new intelligent technologies is becoming problemdriven, rather than curiosity-driven as it often was in the past. Nowadays an organisation addresses its problems with appropriate intelligent tools. In the following sections, we discuss applications of different tools for solving specic problems.

9.2 Will an expert system work for my problem?


Case study 1: Diagnostic expert systems I want to develop an intelligent system that can help me to x malfunctions of my Mac computer. Will an expert system work for this problem?
There is an old but still useful test for prime candidates for expert systems. It is called the Phone Call Rule (Firebaugh, 1988): Any problem that can be solved by your in-house expert in a 1030 minute phone call can be developed as an expert system. Diagnosis and troubleshooting problems (of course, computer diagnosis is one of them) have always been very attractive candidates for expert system technology. As you may recall, medical diagnosis was one of the rst areas to which expert systems were applied. Since then, diagnostic expert systems have found numerous applications, particularly in engineering and manufacturing. Diagnostic expert systems are relatively easy to develop most diagnostic problems have a nite list of possible solutions, involve a rather limited amount of well-formalised knowledge, and often take a human expert a short time (say, an hour) to solve. To develop a computer diagnostic system, we need to acquire knowledge about troubleshooting in computers. We might nd and interview a hardware specialist, but for a small expert system there is a better alternative to use a troubleshooting manual. It provides step-by-step procedures for detecting and xing a variety of faults. In fact, such a manual contains knowledge in the most concise form that can be directly used in an expert system. There is no need to interview an expert, and thus we can avoid the knowledge acquisition bottleneck. Computer manuals often include troubleshooting sections, which consider possible problems with the system start-up, computer/peripherals (hard disk, keyboard, monitor, printer), disk drives (oppy disk, CD-ROM), les, and network and le sharing. In our example, we will consider only troubleshooting the Mac system start-up. However, once the prototype expert system is developed, you can easily expand it. Figure 9.2 illustrates the troubleshooting procedure for the Macintosh computer. As you can see, troubleshooting here is carried out through a series of visual inspections, or tests. We rst collect some initial information (the system does not start), infer from it whatever can be inferred, gather additional information (power cords are OK, Powerstrip is OK, etc.) and nally identify

You might also like