You are on page 1of 43

Classification and Prediction

May 23, 2012

Data Mining: Concepts and Techniques

Agenda

Introduction Decision Tree Induction Statistical based Algo Distance based algo Rule based algo

May 23, 2012

Data Mining: Concepts and Techniques

Classification vs. Prediction

Classification predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction models continuous-valued functions, i.e., predicts unknown or missing values Typical applications Loan Disbursement risky or safe Medical diagnosis (prediction of treatment from below ) Treatment A, Treatment B, Treatment C
Data Mining: Concepts and Techniques 3

May 23, 2012

ClassificationA Two-Step Process

Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematical formulae Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known
Data Mining: Concepts and Techniques 4

May 23, 2012

Process (1): Model Construction


Classification Algorithms

Training Data

NAME Mike Mary Bill Jim Dave Anne

age young young middle_aged middle_aged senior senior

income loan_decision low risky low risky high safe low risky low safe medium safe

Classifier (Model)

IF age = youth THEN loan_decision = risky..


5

May 23, 2012

Data Mining: Concepts and Techniques

Process (2): Using the Model in Prediction

Classifier Testing Data

Unseen Data

(Henry, middle_aged,l
NAME Tom Crest yee age senior middle_aged middle_aged income low low high loan_decision safe risky safe

Loan_decision?

risky
6

May 23, 2012

Data Mining: Concepts and Techniques

Supervised vs. Unsupervised Learning

Supervised learning (classification)

Supervision: The training data are accompanied by labels indicating the class of the observations e.g. risky, safe etc. New data is classified based on the training set The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Data Mining: Concepts and Techniques 7

Unsupervised learning (clustering)


May 23, 2012

Issues: Data Preparation

Data cleaning

Preprocess data in order to reduce noise and handle missing values

Relevance analysis (feature selection)

Remove the irrelevant or redundant attributes


Generalize and/or normalize data

Data transformation

May 23, 2012

Data Mining: Concepts and Techniques

Issues: Evaluating Classification Methods

Accuracy classifier accuracy: predicting class label predictor accuracy: guessing value of predicted attributes Speed time to construct the model (training time) time to use the model (classification/prediction time) Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability understanding and insight provided by the model

May 23, 2012

Data Mining: Concepts and Techniques

Classification by Decision Tree Induction


Decision tree is a flowchart like tree structure Non-leaf node denotes a test on an attribute Each branch represents an outcome of the test Each leaf node holds a class label Topmost node is root

May 23, 2012

Data Mining: Concepts and Techniques

10

Decision Tree Induction: Training Dataset


age <=30 <=30 3140 >40 >40 >40 3140 <=30 <=30 >40 <=30 3140 3140 >40 income high high high medium low low low medium low medium medium medium high medium student no no no no yes yes yes no yes yes yes no yes no credit_rating fair excellent fair fair fair excellent excellent fair fair fair excellent excellent fair excellent buys_computer no no yes yes no yes yes no yes yes yes yes yes no

May 23, 2012

Data Mining: Concepts and Techniques

11

Output: A Decision Tree for buys_computer

age?
youth
Middle_aged

senior

<=30 student?
no yes

31..40 overcast

>40 credit rating?

yes
excellent

fair

no

yes

yes

May 23, 2012

Data Mining: Concepts and Techniques

12

Algorithm for Decision Tree Induction

Basic algorithm Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Examples are partitioned based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

May 23, 2012

Data Mining: Concepts and Techniques

13

DT Algorithm

May 23, 2012

Data Mining: Concepts and Techniques

14

Issues faced by DT algo

Choosing splitting attributes-which attributes to use as splitting impacts the performance e.g. age or credit_rating, student-name is not useful Ordering of splitting attributes order in which attributes are chosen is important. In example age is first, then student and credit_rating is chosen. Splits number of splits to take place Tree structure balanced tree with fewest levels is desirable. Training data the structure of DT depends on training data. If training data is very small generated tree might not be enough to work properly and if it is too large created tree may overfit.(might not work for future states) Pruning once a tree is constructed, some modifications to the tree might be needed to improve the performance of the tree. -- pruning phase redundant comparisons or remove subtrees to achieve better performance.
Data Mining: Concepts and Techniques 15

May 23, 2012

Presentation of Classification Results

May 23, 2012

Data Mining: Concepts and Techniques

16

Visualization of a Decision Tree in SGI/MineSet 3.0

May 23, 2012

Data Mining: Concepts and Techniques

17

Interactive Visual Mining by Perception-Based Classification (PBC)

May 23, 2012

Data Mining: Concepts and Techniques

18

Statistical based algorithms

Regression

Bayesian classification

May 23, 2012

Data Mining: Concepts and Techniques

19

Regression
Regression deals with estimation of output value based on input values. In classification the input values are values from database D and output values represent the classes. If we know input parameters x1,x2,x3..xn then the relation between the output parameter y and input parameters can be y=c0+c1x1+c2x2+cnxn c0,c1,..cn are regression coefficients

May 23, 2012

Data Mining: Concepts and Techniques

20

Linear Regression Poor Fit

Prentice Hall

21

Classification Using Regression


Division: Use regression function to divide area into regions.


In this data is plotted in an n-dimensional space without any explicit class. Through regression the space is divided into regions one per class. Prediction: Formulas are generated to predict the output class value.

Prentice Hall

22

Division

Prentice Hall

23

Prediction

Prentice Hall

24

Bayesian Classification: Why?

A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities Foundation: Based on Bayes Theorem. Performance: A simple Bayesian classifier, nave Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct prior knowledge can be combined with observed data

May 23, 2012

Data Mining: Concepts and Techniques

25

Bayesian Theorem: Basics


Let X be a data sample (evidence): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine P(H|X), the probability that the hypothesis holds given the observed data sample X P(H) (prior probability of H), the initial probability

E.g., X will buy computer, regardless of age, income,

P(X): prior probability of X that sample data is observed e.g. a person from our set of customers is 35 years old and earns $40000.

P(X|H) (posteriori probability), the probability after observing the sample X, given that the hypothesis holds

E.g., Given that X will buy computer, the prob. that X is 31..40, medium income
Data Mining: Concepts and Techniques 26

May 23, 2012

Bayesian Theorem

Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem

P(H | X) P(X | H )P(H ) P(X)

Informally, this can be written as


posteriori = likelihood x prior/evidence Practical difficulty: require initial knowledge of many probabilities, significant computational cost

May 23, 2012

Data Mining: Concepts and Techniques

27

Towards Nave Bayesian Classifier

Let D be a training set of tuples and their associated class labels, and each tuple is represented by an n-D attribute vector X = (x1, x2, , xn) Suppose there are m classes C1, C2, , Cm. Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X) This can be derived from Bayes theorem
P(X | C )P(C ) i i P(C | X) i P(X)

Since P(X) is constant for all classes, only P(C | X) P(X| C )P(C ) i i i needs to be maximized
Data Mining: Concepts and Techniques 28

May 23, 2012

Nave Bayesian Classifier: Training Dataset


age <=30 <=30 3140 >40 >40 >40 3140 <=30 <=30 >40 <=30 3140 3140 >40

Class: C1:buys_computer = yes C2:buys_computer = no Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

income student redit_rating c buys_compu high no fair no high no excellent no high no fair yes medium no fair yes low yes fair yes low yes excellent no low yes excellent yes medium no fair no low yes fair yes medium yes fair yes medium yes excellent yes medium no excellent yes high yes fair yes medium no excellent no
29

May 23, 2012

Data Mining: Concepts and Techniques

Nave Bayesian Classifier: An Example

P(Ci):

P(buys_computer = yes) = 9/14 = 0.643 P(buys_computer = no) = 5/14= 0.357

P(age = <=30 | buys_computer = yes) = 2/9 = 0.222 P(age = <= 30 | buys_computer = no) = 3/5 = 0.6 P(income = medium | buys_computer = yes) = 4/9 = 0.444 P(income = medium | buys_computer = no) = 2/5 = 0.4 P(student = yes | buys_computer = yes) = 6/9 = 0.667 P(student = yes | buys_computer = no) = 1/5 = 0.2 P(credit_rating = fair | buys_computer = yes) = 6/9 = 0.667 P(credit_rating = fair | buys_computer = no) = 2/5 = 0.4 P(X|Ci) : P(X|buys_computer = yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = no) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = yes) * P(buys_computer = yes) =0.044*0.643 =0.028 P(X|buys_computer = no) * P(buys_computer = no) = 0.019*0.357 = 0.007 Therefore, X belongs to class (buys_computer = yes)
May 23, 2012 Data Mining: Concepts and Techniques 30

Compute P(X|Ci) for each class

Nave Bayesian Classifier: Comments

Advantages Easy to implement Good results obtained in most of the cases Disadvantages Assumption: class conditional independence, therefore loss of accuracy Practically, dependencies exist among variables

May 23, 2012

Data Mining: Concepts and Techniques

31

Rule based Classification

May 23, 2012

Data Mining: Concepts and Techniques

32

Using IF-THEN Rules for Classification

Represent the knowledge in the form of IF-THEN rules r=<a,c> where a is antecedent and c is consequent R: IF age = youth AND student = yes THEN buys_computer = yes

Rules are generated by many techniques like decision trees and Neural networks.

May 23, 2012

Data Mining: Concepts and Techniques

33

Generating Rules from DTs


The algorithm will generate a rule for each leaf node in DT

All the rules with same consequent


could be combined together by Oring the antecedents of the simpler rules

Prentice Hall

34

Algorithm for Generating Rules from DTs

May 23, 2012

Data Mining: Concepts and Techniques

35

Generating Rules Example

Prentice Hall

36

Distance based algorithms

May 23, 2012

Data Mining: Concepts and Techniques

37

Classification Using Distance

Each item that is mapped to the same class are more similar to the other items in that class than the items in other classes So distance (or similarity) measures may be used to identify alikeness of different items in database. Place items in class to which they are closest. Must determine distance between an item and a class.

Prentice Hall

38

Classification using simple distance based algo


Classes represented by Centroid: Central value /representative vector e.g. Class A is represented by<4,8>
Class B by Class C by <2,3> <6,3>

Distance Based

Prentice Hall

39

Simple distance based Algo


Input : c1,..,cm // centers for each class t //input tuple to classify Output : c //class to which t is assigned dist = ; for i=1 to m do if dis ( ci, t ) < dist, then c = i; dist = dist (ci, t);

May 23, 2012

Data Mining: Concepts and Techniques

40

K Nearest Neighbor (KNN):


Common classification scheme Lazy learners due to learning from neighbors Training set includes data alongwith classes. Examine K closest items near the item to be classified. New item is placed in the class where the most of the k closest items are.

Prentice Hall

41

KNN
K=3 Three closest items in the training set are shown T will be placed in the class where most of these are.

Prentice Hall

42

KNN Algorithm

Prentice Hall

43

You might also like