Professional Documents
Culture Documents
• Retailing
• Weather Forecasting
• Traffic Congestion
Algorithms
• Data mining algorithms traditionally fall
into one of four broad categories
• Classification
• Clustering
• Association
• Sequence discovery
• Classification, or supervised induction, is perhaps the
most common of all data mining activities. The objective
of classification is to analyze the historical data stored in
a database and to automatically generate a model that
can predict future behavior.
• This induced model consists of generalizations over the
records of a training data set, which help distinguish
predefined classes.
• The hope is that this model can then be used to predict
the classes of other unclassified records.
.
• Common tools used for classification are neural networks,
decision trees and if-then-else rules that need not have a
tree structure.
• Neural networks involve the development of
mathematical structures with the ability to learn.
• Decision trees classify data into a finite number of
classes, based on the values of the variables. DTs are
comprised of essentially a hierarchy of if-then statements
and are thus significantly faster than neural nets
• Rule induction —The extraction of useful if-then rules
from data based on statistical significance. if-then
statements used here need not be hierarchical
• Clustering partitions the database into segments in
which each segment member shares similar qualities
• Associations establish relationships about items that
occur together in a given record
• Sequence Discovery can be looked at as the
identification of associations over time. When
appropriate information is available (for instance, the
identity of a customer in a retail shop), a temporal
analysis can be conducted to identify behavior over time.