Professional Documents
Culture Documents
Prediction
2 Forms of Data Analysis in Extracting Models
• CLASSIFICATION
• PREDICTION
• Classification models predict categorical class labels; and prediction
models predict continuous valued functions.
• For example, we can build a classification model to categorize bank
loan applications as either safe or risky, or a prediction model to
predict the expenditures in dollars of potential customers on
computer equipment given their income and occupation.
What is classification?
• A bank loan officer wants to • In both of the above
analyze the data in order to examples, a model or
know which customer (loan classifier is constructed to
applicant) are risky or which predict the categorical
are safe. labels. These labels are risky
• A marketing manager at a or safe for loan application
company needs to analyze a data and yes or no for
customer with a given marketing data.
profile, who will buy a new
computer.
What is prediction?
• Suppose the marketing • Note − Regression analysis is
manager needs to predict how a statistical methodology
much a given customer will
spend during a sale at his that is most often used for
company. In this example we numeric prediction.
are bothered to predict a
numeric value. Therefore the
data analysis task is an example Typical applications
of numeric prediction. In this Credit approval
case, a model or a predictor Target marketing
will be constructed that Medical diagnosis
predicts a continuous-valued- Fraud detection
function or ordered value.
How Does Classification Works?
The Data Classification process includes two steps −
• Building the Classifier or Model
• Using Classifier for Classification
Building the Classifier or Model
Using Classifier for Classification
Sign with TV
P900,000 P900,000 P900,000
Network
Prior
0.3 0.6 0.1
Probabilities
Juana Magiting - How to Decide?
Chance
Event 1
node
Decision Event 2
node Event 3
Juana Magiting Decision Tree
where pos and neg is the number of positive tuples covered by R, respectively.
Note − This value will increase with the accuracy of R on the pruning set. Hence, if the
FOIL_Prune value is higher for the pruned version of R, then we prune R.
Miscellaneous Classification Methods
• Genetic Algorithms
• The idea of genetic algorithm is derived from
natural evolution. In genetic algorithm, first of
all, the initial population is created. This initial
population consists of randomly generated
rules. We can represent each rule by a string of
bits.
For example, in a given training set, the samples
are described by two Boolean attributes such as
A1 and A2. And this given training set contains
two classes such as C1 and C2.
Miscellaneous Classification Methods
• Rough Set Approach
• We can use the rough set approach to discover
structural relationship within imprecise and noisy
data.
• Note − This approach can only be applied on
discrete-valued attributes. Therefore, continuous-
valued attributes must be discretized before its use.
• The Rough Set Theory is based on the establishment
of equivalence classes within the given training data.
The tuples that forms the equivalence class are
indiscernible. It means the samples are identical
with respect to the attributes describing the data.
Rule Based Reasoning (RBR)
• Rule Based Reasoning (RBR) requires us to elicit an explicit model of the
domain. As we all know and have experienced, knowledge acquisition has a
set of associated problems.