Classification and Prediction Final

Classification and Prediction
May 23, 2012
Data Mining: Concepts and Techniques
Agenda

Introduction Decision Tree Induction Statistical based Algo Distance based algo Rule based algo
May 23, 2012
Classification vs. Prediction
Classification predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction models continuous-valued functions, i.e., predicts unknown or missing values Typical applications Loan Disbursement risky or safe Medical diagnosis (prediction of treatment from below ) Treatment A, Treatment B, Treatment C
Data Mining: Concepts and Techniques 3
May 23, 2012
ClassificationA Two-Step Process
Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematical formulae Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known
May 23, 2012
Process (1): Model Construction

Classification Algorithms
Training Data
NAME Mike Mary Bill Jim Dave Anne
age young young middle_aged middle_aged senior senior
income loan_decision low risky low risky high safe low risky low safe medium safe
Classifier (Model)
IF age = youth THEN loan_decision = risky..

5
May 23, 2012
Process (2): Using the Model in Prediction
Classifier Testing Data
Unseen Data
(Henry, middle_aged,l
NAME Tom Crest yee age senior middle_aged middle_aged income low low high loan_decision safe risky safe
Loan_decision?
risky
6
May 23, 2012
Supervised vs. Unsupervised Learning
Supervised learning (classification)
Supervision: The training data are accompanied by labels indicating the class of the observations e.g. risky, safe etc. New data is classified based on the training set The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Unsupervised learning (clustering)

May 23, 2012
Issues: Data Preparation
Data cleaning
Preprocess data in order to reduce noise and handle missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes

Generalize and/or normalize data
Data transformation
May 23, 2012
Issues: Evaluating Classification Methods
Accuracy classifier accuracy: predicting class label predictor accuracy: guessing value of predicted attributes Speed time to construct the model (training time) time to use the model (classification/prediction time) Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability understanding and insight provided by the model
May 23, 2012
Classification by Decision Tree Induction

Decision tree is a flowchart like tree structure Non-leaf node denotes a test on an attribute Each branch represents an outcome of the test Each leaf node holds a class label Topmost node is root
May 23, 2012
10
Decision Tree Induction: Training Dataset

age <=30 <=30 3140 >40 >40 >40 3140 <=30 <=30 >40 <=30 3140 3140 >40 income high high high medium low low low medium low medium medium medium high medium student no no no no yes yes yes no yes yes yes no yes no credit_rating fair excellent fair fair fair excellent excellent fair fair fair excellent excellent fair excellent buys_computer no no yes yes no yes yes no yes yes yes yes yes no
May 23, 2012
11
Output: A Decision Tree for buys_computer
age?
youth
Middle_aged
senior
<=30 student?
no yes
31..40 overcast
>40 credit rating?
yes
excellent
fair
no
yes
yes
May 23, 2012
12
Algorithm for Decision Tree Induction
Basic algorithm Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Examples are partitioned based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)
May 23, 2012
13
DT Algorithm
May 23, 2012
14
Issues faced by DT algo
Choosing splitting attributes-which attributes to use as splitting impacts the performance e.g. age or credit_rating, student-name is not useful Ordering of splitting attributes order in which attributes are chosen is important. In example age is first, then student and credit_rating is chosen. Splits number of splits to take place Tree structure balanced tree with fewest levels is desirable. Training data the structure of DT depends on training data. If training data is very small generated tree might not be enough to work properly and if it is too large created tree may overfit.(might not work for future states) Pruning once a tree is constructed, some modifications to the tree might be needed to improve the performance of the tree. -- pruning phase redundant comparisons or remove subtrees to achieve better performance.
May 23, 2012
Presentation of Classification Results
May 23, 2012
16
Visualization of a Decision Tree in SGI/MineSet 3.0
May 23, 2012
17
Interactive Visual Mining by Perception-Based Classification (PBC)
May 23, 2012
18
Statistical based algorithms
Regression
Bayesian classification
May 23, 2012
19
Regression
Regression deals with estimation of output value based on input values. In classification the input values are values from database D and output values represent the classes. If we know input parameters x1,x2,x3..xn then the relation between the output parameter y and input parameters can be y=c0+c1x1+c2x2+cnxn c0,c1,..cn are regression coefficients
May 23, 2012
20
Linear Regression Poor Fit
Prentice Hall
21
Classification Using Regression

Division: Use regression function to divide area into regions.

In this data is plotted in an n-dimensional space without any explicit class. Through regression the space is divided into regions one per class. Prediction: Formulas are generated to predict the output class value.
Prentice Hall
22
Division
Prentice Hall
23
Prediction
Prentice Hall
24
Bayesian Classification: Why?
A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities Foundation: Based on Bayes Theorem. Performance: A simple Bayesian classifier, nave Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct prior knowledge can be combined with observed data
May 23, 2012
25
Bayesian Theorem: Basics

Let X be a data sample (evidence): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine P(H|X), the probability that the hypothesis holds given the observed data sample X P(H) (prior probability of H), the initial probability
E.g., X will buy computer, regardless of age, income,
P(X): prior probability of X that sample data is observed e.g. a person from our set of customers is 35 years old and earns $40000.
P(X|H) (posteriori probability), the probability after observing the sample X, given that the hypothesis holds
E.g., Given that X will buy computer, the prob. that X is 31..40, medium income
May 23, 2012
Bayesian Theorem
Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem
P(H | X) P(X | H )P(H ) P(X)
Informally, this can be written as

posteriori = likelihood x prior/evidence Practical difficulty: require initial knowledge of many probabilities, significant computational cost
May 23, 2012
27
Towards Nave Bayesian Classifier
Let D be a training set of tuples and their associated class labels, and each tuple is represented by an n-D attribute vector X = (x1, x2, , xn) Suppose there are m classes C1, C2, , Cm. Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X) This can be derived from Bayes theorem
P(X | C )P(C ) i i P(C | X) i P(X)
Since P(X) is constant for all classes, only P(C | X) P(X| C )P(C ) i i i needs to be maximized
May 23, 2012
Nave Bayesian Classifier: Training Dataset

age <=30 <=30 3140 >40 >40 >40 3140 <=30 <=30 >40 <=30 3140 3140 >40
Class: C1:buys_computer = yes C2:buys_computer = no Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)
income student redit_rating c buys_compu high no fair no high no excellent no high no fair yes medium no fair yes low yes fair yes low yes excellent no low yes excellent yes medium no fair no low yes fair yes medium yes fair yes medium yes excellent yes medium no excellent yes high yes fair yes medium no excellent no
29
May 23, 2012
Nave Bayesian Classifier: An Example
P(Ci):
P(buys_computer = yes) = 9/14 = 0.643 P(buys_computer = no) = 5/14= 0.357
P(age = <=30 | buys_computer = yes) = 2/9 = 0.222 P(age = <= 30 | buys_computer = no) = 3/5 = 0.6 P(income = medium | buys_computer = yes) = 4/9 = 0.444 P(income = medium | buys_computer = no) = 2/5 = 0.4 P(student = yes | buys_computer = yes) = 6/9 = 0.667 P(student = yes | buys_computer = no) = 1/5 = 0.2 P(credit_rating = fair | buys_computer = yes) = 6/9 = 0.667 P(credit_rating = fair | buys_computer = no) = 2/5 = 0.4 P(X|Ci) : P(X|buys_computer = yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = no) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = yes) * P(buys_computer = yes) =0.044*0.643 =0.028 P(X|buys_computer = no) * P(buys_computer = no) = 0.019*0.357 = 0.007 Therefore, X belongs to class (buys_computer = yes)
May 23, 2012 Data Mining: Concepts and Techniques 30
Compute P(X|Ci) for each class
Nave Bayesian Classifier: Comments
Advantages Easy to implement Good results obtained in most of the cases Disadvantages Assumption: class conditional independence, therefore loss of accuracy Practically, dependencies exist among variables
May 23, 2012
31
Rule based Classification
May 23, 2012
32
Using IF-THEN Rules for Classification
Represent the knowledge in the form of IF-THEN rules r=<a,c> where a is antecedent and c is consequent R: IF age = youth AND student = yes THEN buys_computer = yes
Rules are generated by many techniques like decision trees and Neural networks.
May 23, 2012
33
Generating Rules from DTs

The algorithm will generate a rule for each leaf node in DT
All the rules with same consequent

could be combined together by Oring the antecedents of the simpler rules
Prentice Hall
34
Algorithm for Generating Rules from DTs
May 23, 2012
35
Generating Rules Example
Prentice Hall
36
Distance based algorithms
May 23, 2012
37
Classification Using Distance
Each item that is mapped to the same class are more similar to the other items in that class than the items in other classes So distance (or similarity) measures may be used to identify alikeness of different items in database. Place items in class to which they are closest. Must determine distance between an item and a class.
Prentice Hall
38
Classification using simple distance based algo

Classes represented by Centroid: Central value /representative vector e.g. Class A is represented by<4,8>
Class B by Class C by <2,3> <6,3>
Distance Based
Prentice Hall
39
Simple distance based Algo

Input : c1,..,cm // centers for each class t //input tuple to classify Output : c //class to which t is assigned dist = ; for i=1 to m do if dis ( ci, t ) < dist, then c = i; dist = dist (ci, t);
May 23, 2012
40
K Nearest Neighbor (KNN):

Common classification scheme Lazy learners due to learning from neighbors Training set includes data alongwith classes. Examine K closest items near the item to be classified. New item is placed in the class where the most of the k closest items are.
Prentice Hall
41
KNN
K=3 Three closest items in the training set are shown T will be placed in the class where most of these are.
Prentice Hall
42
KNN Algorithm
Prentice Hall
43

Classification and Prediction Final

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification and Prediction Final

Uploaded by

Copyright:

Available Formats

Classification and Prediction

May 23, 2012

Data Mining: Concepts and Techniques

May 23, 2012

Data Mining: Concepts and Techniques

Classification vs. Prediction

May 23, 2012

ClassificationA Two-Step Process

May 23, 2012

Process (1): Model Construction

NAME Mike Mary Bill Jim Dave Anne

age young young middle_aged middle_aged senior senior

IF age = youth THEN loan_decision = risky..

May 23, 2012

Data Mining: Concepts and Techniques

Process (2): Using the Model in Prediction

Classifier Testing Data

May 23, 2012

Data Mining: Concepts and Techniques

Supervised vs. Unsupervised Learning

Supervised learning (classification)

Unsupervised learning (clustering)

May 23, 2012

Issues: Data Preparation

Preprocess data in order to reduce noise and handle missing values

Relevance analysis (feature selection)

Remove the irrelevant or redundant attributes

May 23, 2012

Data Mining: Concepts and Techniques

Issues: Evaluating Classification Methods

May 23, 2012

Data Mining: Concepts and Techniques

Classification by Decision Tree Induction

May 23, 2012

Data Mining: Concepts and Techniques

Decision Tree Induction: Training Dataset

May 23, 2012

Data Mining: Concepts and Techniques

Output: A Decision Tree for buys_computer

>40 credit rating?

May 23, 2012

Data Mining: Concepts and Techniques

Algorithm for Decision Tree Induction

May 23, 2012

Data Mining: Concepts and Techniques

May 23, 2012

Data Mining: Concepts and Techniques

Issues faced by DT algo

May 23, 2012

Presentation of Classification Results

May 23, 2012

Data Mining: Concepts and Techniques

Visualization of a Decision Tree in SGI/MineSet 3.0

May 23, 2012

Data Mining: Concepts and Techniques

Interactive Visual Mining by Perception-Based Classification (PBC)

May 23, 2012

Data Mining: Concepts and Techniques

Statistical based algorithms

May 23, 2012

Data Mining: Concepts and Techniques

May 23, 2012

Data Mining: Concepts and Techniques